Removing the big batman lock

Message ID 20091129200950.GA19275@pandem0nium (mailing list archive)
State Superseded, archived
Headers

Commit Message

Simon Wunderlich Nov. 29, 2009, 8:09 p.m. UTC
  The orig_hash_lock locks big sections of the batman-adv code base, which
might lead to unneccesary performance degradation at some points. 

Therefore this patch moves the locking from the whole originator hash
table to individual orig nodes, introducing a reference count based
system to identify whether a node is still in use or can be free()d.

To summarize the changes:
 * when iterating, the hash is only locked while an item is picked
   from the hash. Access to the orig_nodes themselves is no longer 
   protected by orig_hash_lock.
 * Each orig_node is referenced when it is used or referenced (e.g. as
   neighbor), and freed when there are no references left. This makes
   it neccesary to carefully reference and unreference nodes.
 * orig_nodes receive an individual spin lock to protect access to
   their data.

I did some testing, including loading, unloading, killing individual
nodes etc, which seems to be clean so far. However there might be 
more race conditions introduced by this large patch, and i'm therefore
requesting a careful review before committing.

Signed-off-by: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
---
  

Comments

Andrew Lunn Nov. 30, 2009, 10:09 a.m. UTC | #1
On Sun, Nov 29, 2009 at 09:09:50PM +0100, Simon Wunderlich wrote:

> I did some testing, including loading, unloading, killing individual
> nodes etc, which seems to be clean so far. However there might be 
> more race conditions introduced by this large patch, and i'm therefore
> requesting a careful review before committing.

Hi Simon

I've not done a careful review yet, just a quick look.

Two idea for improvements:

In the purge code, add some debug code which looks at the refcount
value. If it is not between 1 and 5, prinkt() a warning what there is
probably a missing refcount operation. Since the purge code does not
run very often, it should not add much overhead, yet it is still a
useful debug tool.

The following bit of code happens quite a lot:

while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {


There is also a comment about having to free hashit, if you don't
iterate to the end of the hash. How about refactoring this, more like
the linux list.h.

Make hashit a stack variable, with an init macro:

#define HASHIT(name) struct hash_it_t name = { .index = -1, .bucket = NULL, \
	                                       .prev_bucket=NULL,           \
 					       .first_bucket=NULL }

and a macro for iterating over the hash
    
    HASHIT(hashit);

    orig_hash_for_each(orig_node, hashit) {

    foo(orig_node);
    }


    Andrew
  
Linus Lüssing Dec. 3, 2009, 1:31 a.m. UTC | #2
Hi Simon,

I just gave your patch a try on my laptop and could successfully,
reproduceably crash my kernel in the following way:
Setting up wifi to ad-hoc mode and connecting it to other
batman-wifi-nodes, insmodding batman-adv on my laptop and adding
this wifi interface to batman -> kernel hangs (see the two
attachements for more detailed error messages thrown by the kernel).

I'm not an expert in this, just a guess: Could it be, that
purge_orig() is executing the spinlock first and calling
free_orig_node() then, which tries to lock the same variable
again, resulting into a deadlock?

Cheers, Linus
[21060.326349] batman-adv:Adding interface: wlan1
[21060.337123] batman-adv:Interface activated: wlan1
[21125.872005] BUG: soft lockup - CPU#0 stuck for 61s! [bat_events:5157]
[21125.872007] Modules linked in: batman_adv(-) tun nvidia(P) uinput ppdev lp parport sco bridge stp bnep rfcomm kvm_intel kvm acpi_cpufreq cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot dm_mirror dm_region_hash dm_log firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss uvcvideo snd_mixer_oss arc4 ecb btusb snd_pcm snd_seq_midi snd_rawmidi videodev snd_seq_midi_event snd_seq v4l1_compat snd_timer iwlagn v4l2_compat_ioctl32 snd_seq_device iwlcore bluetooth snd i2c_i801 mac80211 soundcore i2c_core acer_wmi button processor ac battery snd_page_alloc cfg80211 psmouse evdev pcspkr serio_raw wmi rfkill ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod ide_cd_mod sd_mod crc_t10dif cdrom ata_generic ide_pci_generic mmc_block ahci uhci_hcd libata piix ide_core sdhci_pci sdhci tg3 ricoh_mmc libphy firewire_ohci firewire_core crc_itu_t scsi_mod mmc_core led_class intel_agp video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[21125.872007] CPU 0:
[21125.872007] Modules linked in: batman_adv(-) tun nvidia(P) uinput ppdev lp parport sco bridge stp bnep rfcomm kvm_intel kvm acpi_cpufreq cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot dm_mirror dm_region_hash dm_log firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss uvcvideo snd_mixer_oss arc4 ecb btusb snd_pcm snd_seq_midi snd_rawmidi videodev snd_seq_midi_event snd_seq v4l1_compat snd_timer iwlagn v4l2_compat_ioctl32 snd_seq_device iwlcore bluetooth snd i2c_i801 mac80211 soundcore i2c_core acer_wmi button processor ac battery snd_page_alloc cfg80211 psmouse evdev pcspkr serio_raw wmi rfkill ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod ide_cd_mod sd_mod crc_t10dif cdrom ata_generic ide_pci_generic mmc_block ahci uhci_hcd libata piix ide_core sdhci_pci sdhci tg3 ricoh_mmc libphy firewire_ohci firewire_core crc_itu_t scsi_mod mmc_core led_class intel_agp video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[21125.872007] Pid: 5157, comm: bat_events Tainted: P           2.6.31-1-amd64 #1 ��������������� 
[21125.872007] RIP: 0010:[<ffffffff812de34a>]  [<ffffffff812de34a>] _spin_lock+0xf/0x1b
[21125.872007] RSP: 0000:ffff88007dd99d98  EFLAGS: 00000297
[21125.872007] RAX: 0000000000000008 RBX: ffff88004f878b40 RCX: 0000000000000001
[21125.872007] RDX: 0000000000000007 RSI: ffffffffa0e8f02b RDI: ffff88004f878b90
[21125.872007] RBP: ffffffff8101166e R08: ffff88007d5eca20 R09: 000000000000000f
[21125.872007] R10: 0000000000000020 R11: 0000000000000000 R12: 0000000000000000
[21125.872007] R13: ffff88007d9d0000 R14: ffff88005f1f2c80 R15: ffff880070052000
[21125.872007] FS:  0000000000000000(0000) GS:ffff880001643000(0000) knlGS:0000000000000000
[21125.872007] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[21125.872007] CR2: 00000000f39dc000 CR3: 0000000001001000 CR4: 00000000000026f0
[21125.872007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21125.872007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[21125.872007] Call Trace:
[21125.872007]  [<ffffffffa0e8c78a>] ? free_orig_node+0x26/0x84 [batman_adv]
[21125.872007]  [<ffffffffa0e8ca18>] ? purge_orig+0x198/0x1d8 [batman_adv]
[21125.872007]  [<ffffffffa0e8c880>] ? purge_orig+0x0/0x1d8 [batman_adv]
[21125.872007]  [<ffffffff8105b43b>] ? worker_thread+0x174/0x211
[21125.872007]  [<ffffffff8105f39e>] ? autoremove_wake_function+0x0/0x2e
[21125.872007]  [<ffffffff8105b2c7>] ? worker_thread+0x0/0x211
[21125.872007]  [<ffffffff8105f042>] ? kthread+0x8b/0x93
[21125.872007]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[21125.872007]  [<ffffffff8105efb7>] ? kthread+0x0/0x93
[21125.872007]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 4412.876146] wlan1: Trigger new scan to find an IBSS to join
[ 4417.816058] wlan1: Trigger new scan to find an IBSS to join
[ 4420.102974] wlan1: Creating new IBSS network, BSSID 1a:f2:51:40:bc:e9
[ 4421.761168] wlan1: Selected IBSS BSSID 02:22:b0:44:94:5d based on configured SSID
[ 4433.902832] batman-adv:B.A.T.M.A.N. advanced 0.2.1-beta r1489 (compatibility version 8) loaded
[ 4440.853737] batman-adv:Adding interface: wlan1
[ 4440.865090] batman-adv:Interface activated: wlan1
[ 4441.410248] NOHZ: local_softirq_pending 08
[ 4506.496007] BUG: soft lockup - CPU#0 stuck for 61s! [bat_events:13855]
[ 4506.496007] Modules linked in: batman_adv nvidia(P) uinput ppdev lp parport sco bridge stp rfcomm bnep kvm_intel kvm acpi_cpufreq cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot dm_mirror dm_region_hash dm_log firewire_sbp2 loop arc4 ecb iwlagn iwlcore acer_wmi snd_hda_codec_realtek mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvideo snd_timer snd_seq_device videodev snd v4l1_compat soundcore i2c_i801 btusb v4l2_compat_ioctl32 snd_page_alloc bluetooth cfg80211 wmi rfkill i2c_core pcspkr psmouse evdev ac processor button serio_raw battery ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod ide_cd_mod sd_mod cdrom crc_t10dif ata_generic ide_pci_generic ahci libata uhci_hcd piix ide_core scsi_mod tg3 libphy sdhci_pci sdhci ricoh_mmc firewire_ohci firewire_core crc_itu_t mmc_core led_class intel_agp video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[ 4506.496007] CPU 0:
[ 4506.496007] Modules linked in: batman_adv nvidia(P) uinput ppdev lp parport sco bridge stp rfcomm bnep kvm_intel kvm acpi_cpufreq cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot dm_mirror dm_region_hash dm_log firewire_sbp2 loop arc4 ecb iwlagn iwlcore acer_wmi snd_hda_codec_realtek mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvideo snd_timer snd_seq_device videodev snd v4l1_compat soundcore i2c_i801 btusb v4l2_compat_ioctl32 snd_page_alloc bluetooth cfg80211 wmi rfkill i2c_core pcspkr psmouse evdev ac processor button serio_raw battery ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod ide_cd_mod sd_mod cdrom crc_t10dif ata_generic ide_pci_generic ahci libata uhci_hcd piix ide_core scsi_mod tg3 libphy sdhci_pci sdhci ricoh_mmc firewire_ohci firewire_core crc_itu_t mmc_core led_class intel_agp video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[ 4506.496007] Pid: 13855, comm: bat_events Tainted: P           2.6.31-1-amd64 #1 ���������������
[ 4506.496007] RIP: 0010:[<ffffffff812de350>]  [<ffffffff812de350>] _spin_lock+0x15/0x1b
[ 4506.496007] RSP: 0018:ffff88005d4cfd98  EFLAGS: 00000297
[ 4506.496007] RAX: 0000000000000008 RBX: ffff88005d990440 RCX: 0000000000000001
[ 4506.496007] RDX: 0000000000000007 RSI: ffffffffa101502b RDI: ffff88005d990490
[ 4506.496007] RBP: ffffffff8101166e R08: ffff88005d507200 R09: 000000000000000f
[ 4506.496007] R10: ffff880001652f20 R11: ffffffff81223a78 R12: 0000000000000000
[ 4506.496007] R13: 0000000000000286 R14: 0000000000000014 R15: ffff88005d4cfd58
[ 4506.496007] FS:  0000000000000000(0000) GS:ffff880001643000(0000) knlGS:0000000000000000
[ 4506.496007] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 4506.496007] CR2: 00000000f77d1000 CR3: 0000000001001000 CR4: 00000000000026f0
[ 4506.496007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4506.496007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4506.496007] Call Trace:
[ 4506.496007]  [<ffffffffa101278a>] ? free_orig_node+0x26/0x84 [batman_adv]
[ 4506.496007]  [<ffffffffa1012a18>] ? purge_orig+0x198/0x1d8 [batman_adv]
[ 4506.496007]  [<ffffffffa1012880>] ? purge_orig+0x0/0x1d8 [batman_adv]
[ 4506.496007]  [<ffffffff8105b43b>] ? worker_thread+0x174/0x211
[ 4506.496007]  [<ffffffff8105f39e>] ? autoremove_wake_function+0x0/0x2e
[ 4506.496007]  [<ffffffff8105b2c7>] ? worker_thread+0x0/0x211
[ 4506.496007]  [<ffffffff8105f042>] ? kthread+0x8b/0x93
[ 4506.496007]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 4506.496007]  [<ffffffff8105efb7>] ? kthread+0x0/0x93
[ 4506.496007]  [<ffffffff81011ba0>] ? child_rip+0x0/0x2
  
Andrew Lunn Dec. 3, 2009, 6 a.m. UTC | #3
On Thu, Dec 03, 2009 at 02:31:22AM +0100, Linus L??ssing wrote:
> Hi Simon,
> 
> I just gave your patch a try on my laptop and could successfully,
> reproduceably crash my kernel in the following way:

This looks like a deadlock. 

Simon: Did you try lockdep on this new code? 

http://lwn.net/Articles/185666/

This should hopefully show you what locks are being taken in the wrong
order. This is something i keep intending on running, but never get
around to it. Now we have a probably deadlock, it might be enough
reason for me to actually do it.

	Andrew
  
Sven Eckelmann Dec. 4, 2009, 5:39 p.m. UTC | #4
Andrew Lunn wrote:
> On Thu, Dec 03, 2009 at 02:31:22AM +0100, Linus L??ssing wrote:
> > Hi Simon,
> >
> > I just gave your patch a try on my laptop and could successfully,
> > reproduceably crash my kernel in the following way:
> 
> This looks like a deadlock.
> 
> Simon: Did you try lockdep on this new code?
> 
> http://lwn.net/Articles/185666/
> 
> This should hopefully show you what locks are being taken in the wrong
> order. This is something i keep intending on running, but never get
> around to it. Now we have a probably deadlock, it might be enough
> reason for me to actually do it.

I am currently playing a little bit around and it seems that this tools will
be extreme helpful. Output of the new code:

[  330.076502] batman-adv:B.A.T.M.A.N. advanced 0.2.1-beta (compatibility version 8) loaded                                                         
[  345.187520] batman-adv:Adding interface: eth0                                                                                                    
[  345.187520] batman-adv:Interface activated: eth0                                                                                                 
[  401.149958]                                                                                                                                      
[  401.149961] =================================                                                                                                    
[  401.151567] [ INFO: inconsistent lock state ]                                                                                                    
[  401.153477] 2.6.32 #2                                                                                                                            
[  401.153477] ---------------------------------                                                                                                    
[  401.153477] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.                                                                                 
[  401.153477] bat_events/5703 [HC0[0]:SC0[0]:HE1:SE1] takes:                                                                                       
[  401.153477]  (forw_bcast_list_lock){+.?...}, at: [<f8a02af2>] send_outstanding_bcast_packet+0x22/0x194 [batman_adv]                              
[  401.153477] {IN-SOFTIRQ-W} state was registered at:                                                                                              
[  401.153477]   [<c108249d>] __lock_acquire+0x6aa/0x1aa9                                                                                           
[  401.153477]   [<c108392c>] lock_acquire+0x90/0xc3                                                                                                
[  401.153477]   [<c1a5ec60>] _spin_lock+0x3a/0xda                                                                                                  
[  401.153477]   [<f8a01cbb>] _add_bcast_packet_to_list+0x2e/0xfb [batman_adv]                                                                      
[  401.153477]   [<f8a01ec4>] add_bcast_packet_to_list+0x13c/0x150 [batman_adv]                                                                     
[  401.153477]   [<f8a05da1>] interface_tx+0x12f/0x316 [batman_adv]                                                                                 
[  401.153477]   [<c189f6be>] dev_hard_start_xmit+0x439/0x593                                                                                       
[  401.153477]   [<c18bc193>] sch_direct_xmit+0xa0/0x2c2                                                                                            
[  401.153477]   [<c189ff38>] dev_queue_xmit+0x536/0x847                                                                                            
[  401.153477]   [<c18ab9e9>] neigh_resolve_output+0x3ff/0x4b3                                                                                      
[  401.153477]   [<c194409e>] ip6_output_finish+0x12d/0x1af                                                                                         
[  401.153477]   [<c19473d8>] ip6_output2+0x312/0x336                                                                                               
[  401.153477]   [<c1948463>] ip6_output+0x1067/0x1081                                                                                              
[  401.153477]   [<c196fc56>] mld_sendpack+0x25b/0x400                                                                                              
[  401.153477]   [<c197144f>] mld_ifc_timer_expire+0x41a/0x4a4                                                                                      
[  401.153477]   [<c105332a>] run_timer_softirq+0x274/0x355                                                                                         
[  401.153477]   [<c104bf63>] __do_softirq+0xc0/0x1f8                                                                                               
[  401.153477] irq event stamp: 3945                                                                                                                
[  401.153477] hardirqs last  enabled at (3945): [<c1a5e82b>] _spin_unlock_irq+0x5a/0x7a                                                            
[  401.153477] hardirqs last disabled at (3944): [<c1a5ee16>] _spin_lock_irq+0x29/0xfc                                                              
[  401.153477] softirqs last  enabled at (3922): [<c104c085>] __do_softirq+0x1e2/0x1f8                                                              
[  401.153477] softirqs last disabled at (3913): [<c100654e>] do_softirq+0xad/0x1cf                                                                 
[  401.153477]                                                                                                                                      
[  401.153477] other info that might help us debug this:
[  401.153477] 2 locks held by bat_events/5703:
[  401.153477]  #0:  (bat_events){+.+...}, at: [<c105fc73>] worker_thread+0x256/0x4a9
[  401.153477]  #1:  (&(&forw_packet->delayed_work)->work){+.+...}, at: [<c105fc93>] worker_thread+0x276/0x4a9
[  401.153477]
[  401.153477] stack backtrace:
[  401.153477] Pid: 5703, comm: bat_events Not tainted 2.6.32 #2
[  401.153477] Call Trace:
[  401.153477]  [<c1a5a8cd>] ? printk+0x2b/0x4e
[  401.153477]  [<c107feab>] valid_state+0x252/0x290
[  401.153477]  [<c10800b6>] mark_lock+0x1cd/0x473
[  401.153477]  [<c1081323>] ? check_usage_backwards+0x0/0xcc
[  401.153477]  [<c10825da>] __lock_acquire+0x7e7/0x1aa9
[  401.153477]  [<c107ee0d>] ? save_trace+0x45/0x128
[  401.153477]  [<c107efc9>] ? add_lock_to_list+0xd9/0x133
[  401.153477]  [<c10835cb>] ? __lock_acquire+0x17d8/0x1aa9
[  401.153477]  [<c1083764>] ? __lock_acquire+0x1971/0x1aa9
[  401.153477]  [<c105fc93>] ? worker_thread+0x276/0x4a9
[  401.153477]  [<c108392c>] lock_acquire+0x90/0xc3
[  401.153477]  [<f8a02af2>] ? send_outstanding_bcast_packet+0x22/0x194 [batman_adv]
[  401.153477]  [<c1a5ec60>] _spin_lock+0x3a/0xda
[  401.153477]  [<f8a02af2>] ? send_outstanding_bcast_packet+0x22/0x194 [batman_adv]
[  401.153477]  [<f8a02af2>] send_outstanding_bcast_packet+0x22/0x194 [batman_adv]
[  401.153477]  [<c105fcce>] worker_thread+0x2b1/0x4a9
[  401.153477]  [<c105fc93>] ? worker_thread+0x276/0x4a9
[  401.153477]  [<f8a02ad0>] ? send_outstanding_bcast_packet+0x0/0x194 [batman_adv]
[  401.153477]  [<c1065cbf>] ? autoremove_wake_function+0x0/0x5d
[  401.153477]  [<c105fa1d>] ? worker_thread+0x0/0x4a9
[  401.153477]  [<c1065924>] kthread+0xa6/0xb9
[  401.153477]  [<c106587e>] ? kthread+0x0/0xb9
[  401.153477]  [<c1004c07>] kernel_thread_helper+0x7/0x10
[  411.264212] bat0: no IPv6 routers present



And after a while (adding and removing some nodes from my virtual network
we also get something like Linus problem - but with worker thread included
and without free_orig).

[ 1903.748085] BUG: soft lockup - CPU#0 stuck for 61s! [bat_events:5703]                                                                            
[ 1903.748109] Modules linked in: batman_adv                                                                                                        
[ 1903.748109] irq event stamp: 3945                                                                                                                
[ 1903.748109] hardirqs last  enabled at (3945): [<c1a5e82b>] _spin_unlock_irq+0x5a/0x7a                                                            
[ 1903.748109] hardirqs last disabled at (3944): [<c1a5ee16>] _spin_lock_irq+0x29/0xfc                                                              
[ 1903.748109] softirqs last  enabled at (3922): [<c104c085>] __do_softirq+0x1e2/0x1f8                                                              
[ 1903.748109] softirqs last disabled at (3913): [<c100654e>] do_softirq+0xad/0x1cf                                                                 
[ 1903.748109]                                                                                                                                      
[ 1903.748109] Pid: 5703, comm: bat_events Not tainted (2.6.32 #2)                                                                                  
[ 1903.748109] EIP: 0060:[<c12be8ab>] EFLAGS: 00000246 CPU: 0                                                                                       
[ 1903.748109] EIP is at delay_tsc+0x17/0xb1                                                                                                        
[ 1903.748109] EAX: 00000001 EBX: 00000001 ECX: 67d98e54 EDX: 00001f2d                                                                              
[ 1903.748109] ESI: 00000000 EDI: f6cf50b4 EBP: f206be84 ESP: f206be80                                                                              
[ 1903.748109]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068                                                                                        
[ 1903.748109] CR0: 8005003b CR2: 080ef480 CR3: 37bcc000 CR4: 000006f0                                                                              
[ 1903.748109] DR0: c1041443 DR1: 00000000 DR2: 00000000 DR3: 00000000                                                                              
[ 1903.748109] DR6: ffff0ff0 DR7: 00000400                                                                                                          
[ 1903.748109] Call Trace:                                                                                                                          
[ 1903.748109]  [<c12be979>] __delay+0x17/0x27                                                                                                      
[ 1903.748109]  [<c12dc3e6>] _raw_spin_lock+0x164/0x1fc                                                                                             
[ 1903.748109]  [<c1a5ecc5>] _spin_lock+0x9f/0xda                                                                                                   
[ 1903.748109]  [<f8a0bf44>] purge_orig+0x73/0x3c7 [batman_adv]                                                                                     
[ 1903.748109]  [<c1080bae>] ? trace_hardirqs_on+0x27/0x37                                                                                          
[ 1903.748109]  [<c1a5e82b>] ? _spin_unlock_irq+0x5a/0x7a                                                                                           
[ 1903.748109]  [<c1050032>] ? __register_sysctl_paths+0x1c1/0x428                                                                                  
[ 1903.748109]  [<c105fcce>] worker_thread+0x2b1/0x4a9                                                                                              
[ 1903.748109]  [<c105fc93>] ? worker_thread+0x276/0x4a9                                                                                            
[ 1903.748109]  [<f8a0bed1>] ? purge_orig+0x0/0x3c7 [batman_adv]                                                                                    
[ 1903.748109]  [<c1065cbf>] ? autoremove_wake_function+0x0/0x5d                                                                                    
[ 1903.748109]  [<c105fa1d>] ? worker_thread+0x0/0x4a9                                                                                              
[ 1903.748109]  [<c1065924>] kthread+0xa6/0xb9                                                                                                      
[ 1903.748109]  [<c106587e>] ? kthread+0x0/0xb9                                                                                                     
[ 1903.748109]  [<c1004c07>] kernel_thread_helper+0x7/0x10                                                                                          
[ 1969.244089] BUG: soft lockup - CPU#0 stuck for 61s! [bat_events:5703]                                                                            
[ 1969.244089] Modules linked in: batman_adv                                                                                                        
[ 1969.244089] irq event stamp: 3945                                                                                                                
[ 1969.244089] hardirqs last  enabled at (3945): [<c1a5e82b>] _spin_unlock_irq+0x5a/0x7a                                                            
[ 1969.244089] hardirqs last disabled at (3944): [<c1a5ee16>] _spin_lock_irq+0x29/0xfc                                                              
[ 1969.244089] softirqs last  enabled at (3922): [<c104c085>] __do_softirq+0x1e2/0x1f8                                                              
[ 1969.244089] softirqs last disabled at (3913): [<c100654e>] do_softirq+0xad/0x1cf                                                                 
[ 1969.244089]                                                                                                                                      
[ 1969.244089] Pid: 5703, comm: bat_events Not tainted (2.6.32 #2)                                                                                  
[ 1969.244089] EIP: 0060:[<c12be8df>] EFLAGS: 00000246 CPU: 0                                                                                       
[ 1969.244089] EIP is at delay_tsc+0x4b/0xb1                                                                                                        
[ 1969.244089] EAX: 15363869 EBX: 00000001 ECX: 15363869 EDX: 00001f4f                                                                              
[ 1969.244089] ESI: 00000000 EDI: f6cf50b4 EBP: f206be84 ESP: f206be80                                                                              
[ 1969.244089]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068                                                                                        
[ 1969.244089] CR0: 8005003b CR2: 080ef480 CR3: 37bcc000 CR4: 000006f0                                                                              
[ 1969.244089] DR0: c1041443 DR1: 00000000 DR2: 00000000 DR3: 00000000                                                                              
[ 1969.244089] DR6: ffff0ff0 DR7: 00000400                                                                                                          
[ 1969.244089] Call Trace:                                                                                                                          
[ 1969.244089]  [<c12be979>] __delay+0x17/0x27                                                                                                      
[ 1969.244089]  [<c12dc3e6>] _raw_spin_lock+0x164/0x1fc                                                                                             
[ 1969.244089]  [<c1a5ecc5>] _spin_lock+0x9f/0xda                                                                                                   
[ 1969.244089]  [<f8a0bf44>] purge_orig+0x73/0x3c7 [batman_adv]                                                                                     
[ 1969.244089]  [<c1080bae>] ? trace_hardirqs_on+0x27/0x37                                                                                          
[ 1969.244089]  [<c1a5e82b>] ? _spin_unlock_irq+0x5a/0x7a                                                                                           
[ 1969.244089]  [<c1050032>] ? __register_sysctl_paths+0x1c1/0x428                                                                                  
[ 1969.244089]  [<c105fcce>] worker_thread+0x2b1/0x4a9                                                                                              
[ 1969.244089]  [<c105fc93>] ? worker_thread+0x276/0x4a9                                                                                            
[ 1969.244089]  [<f8a0bed1>] ? purge_orig+0x0/0x3c7 [batman_adv]                                                                                    
[ 1969.244089]  [<c1065cbf>] ? autoremove_wake_function+0x0/0x5d                                                                                    
[ 1969.244089]  [<c105fa1d>] ? worker_thread+0x0/0x4a9                                                                                              
[ 1969.244089]  [<c1065924>] kthread+0xa6/0xb9                                                                                                      
[ 1969.244089]  [<c106587e>] ? kthread+0x0/0xb9                                                                                                     
[ 1969.244089]  [<c1004c07>] kernel_thread_helper+0x7/0x10                                                                                          
[ 1972.181954] BUG: spinlock lockup on CPU#0, bat_events/5703, f6cf50b4                                                                             
[ 1972.183041] Pid: 5703, comm: bat_events Not tainted 2.6.32 #2                                                                                    
[ 1972.184006] Call Trace:                                                                                                                          
[ 1972.184627]  [<c1a5a8cd>] ? printk+0x2b/0x4e                                                                                                     
[ 1972.185362]  [<c12dc458>] _raw_spin_lock+0x1d6/0x1fc                                                                                             
[ 1972.186208]  [<c1a5ecc5>] _spin_lock+0x9f/0xda                                                                                                   
[ 1972.186973]  [<f8a0bf44>] purge_orig+0x73/0x3c7 [batman_adv]                                                                                     
[ 1972.187934]  [<c1080bae>] ? trace_hardirqs_on+0x27/0x37                                                                                          
[ 1972.189019]  [<c1a5e82b>] ? _spin_unlock_irq+0x5a/0x7a                                                                                           
[ 1972.189914]  [<c1050032>] ? __register_sysctl_paths+0x1c1/0x428                                                                                  
[ 1972.190917]  [<c105fcce>] worker_thread+0x2b1/0x4a9                                                                                              
[ 1972.191746]  [<c105fc93>] ? worker_thread+0x276/0x4a9                                                                                            
[ 1972.192810]  [<f8a0bed1>] ? purge_orig+0x0/0x3c7 [batman_adv]                                                                                    
[ 1972.193798]  [<c1065cbf>] ? autoremove_wake_function+0x0/0x5d                                                                                    
[ 1972.194773]  [<c105fa1d>] ? worker_thread+0x0/0x4a9                                                                                              
[ 1972.195595]  [<c1065924>] kthread+0xa6/0xb9                                                                                                      
[ 1972.196494]  [<c106587e>] ? kthread+0x0/0xb9                                                                                                     
[ 1972.197323]  [<c1004c07>] kernel_thread_helper+0x7/0x10


And here some recursive locking which was found by Linus before:

[ 1848.122378] =============================================                                                                                    
[ 1848.123961] [ INFO: possible recursive locking detected ]                                                                                    
[ 1848.124440] 2.6.32 #2                                                                                                                        
[ 1848.124440] ---------------------------------------------                                                                                    
[ 1848.124440] bat_events/5424 is trying to acquire lock:                                                                                       
[ 1848.124440]  (&orig_node->lock){+.+...}, at: [<f8a0bd4a>] free_orig_node+0x3c/0xeb [batman_adv]                                              
[ 1848.124440]                                                                                                                                  
[ 1848.124440] but task is already holding lock:                                                                                                
[ 1848.124440]  (&orig_node->lock){+.+...}, at: [<f8a0c002>] purge_orig+0xa7/0x4a5 [batman_adv]
[ 1848.124440]
[ 1848.124440] other info that might help us debug this:
[ 1848.124440] 3 locks held by bat_events/5424:
[ 1848.124440]  #0:  (bat_events){+.+...}, at: [<c105fc73>] worker_thread+0x256/0x4a9
[ 1848.124440]  #1:  ((purge_orig_wq).work){+.+...}, at: [<c105fc93>] worker_thread+0x276/0x4a9
[ 1848.124440]  #2:  (&orig_node->lock){+.+...}, at: [<f8a0c002>] purge_orig+0xa7/0x4a5 [batman_adv]
[ 1848.124440]
[ 1848.124440] stack backtrace:
[ 1848.124440] Pid: 5424, comm: bat_events Not tainted 2.6.32 #2
[ 1848.124440] Call Trace:
[ 1848.124440]  [<c1a5a8cd>] ? printk+0x2b/0x4e
[ 1848.124440]  [<c108313c>] __lock_acquire+0x1349/0x1aa9
[ 1848.124440]  [<c1043245>] ? release_console_sem+0x30b/0x359
[ 1848.124440]  [<c108392c>] lock_acquire+0x90/0xc3
[ 1848.124440]  [<f8a0bd4a>] ? free_orig_node+0x3c/0xeb [batman_adv]
[ 1848.124440]  [<c1a5ec60>] _spin_lock+0x3a/0xda
[ 1848.124440]  [<f8a0bd4a>] ? free_orig_node+0x3c/0xeb [batman_adv]
[ 1848.124440]  [<f8a0bd4a>] free_orig_node+0x3c/0xeb [batman_adv]
[ 1848.124440]  [<f8a0c325>] purge_orig+0x3ca/0x4a5 [batman_adv]
[ 1848.124440]  [<c106e623>] ? sched_clock_cpu+0x19e/0x1bf
[ 1848.124440]  [<c107e325>] ? trace_hardirqs_off+0x27/0x37
[ 1848.124440]  [<c106e6c6>] ? cpu_clock+0x82/0xcf
[ 1848.124440]  [<c1050032>] ? __register_sysctl_paths+0x1c1/0x428
[ 1848.124440]  [<c105fcce>] worker_thread+0x2b1/0x4a9
[ 1848.124440]  [<c105fc93>] ? worker_thread+0x276/0x4a9
[ 1848.124440]  [<f8a0bf5b>] ? purge_orig+0x0/0x4a5 [batman_adv]
[ 1848.124440]  [<c1065cbf>] ? autoremove_wake_function+0x0/0x5d
[ 1848.124440]  [<c105fa1d>] ? worker_thread+0x0/0x4a9
[ 1848.124440]  [<c1065924>] kthread+0xa6/0xb9
[ 1848.124440]  [<c106587e>] ? kthread+0x0/0xb9
[ 1848.124440]  [<c1004c07>] kernel_thread_helper+0x7/0x10


I created a kernel which is mostly compatible with my uml image. So if
anyone wants to try it, I am using a i386 kernel with kvm and use


$ kvm -m 1024 -kernel linux-2.6.32.qemu -drive file=root.qemu,if=virtio -serial stdio -net 
nic,macaddr=02:ca:ff:ee:ba:43,model=virtio,vlan=3 -net tap,ifname=tap3,vlan=3,script=no

to test it (all virtual devices are managed inside a bridge managed by me).
I am a little bit confused why the spinlock deadlocks because it should be an
unicore processor build - but maybe it has to do with lockdep.

Best regards,
	Sven
  
Simon Wunderlich Dec. 13, 2009, 9:17 p.m. UTC | #5
Hello Andrew,

sorry for the late answer and thank you for your remarks.

I like your idea of the static hash iterator and have got this implemented in
svn revision 1499. This should remove the problems about the memory leaks
with dangling hash iterators. What do you think?

As the patch obviously has some more severe problems (deadlocks etc) i will
have to review this seperately ... :(

On Mon, Nov 30, 2009 at 11:09:02AM +0100, Andrew Lunn wrote:
> On Sun, Nov 29, 2009 at 09:09:50PM +0100, Simon Wunderlich wrote:
> 
> > I did some testing, including loading, unloading, killing individual
> > nodes etc, which seems to be clean so far. However there might be 
> > more race conditions introduced by this large patch, and i'm therefore
> > requesting a careful review before committing.
> 
> Hi Simon
> 
> I've not done a careful review yet, just a quick look.
> 
> Two idea for improvements:
> 
> In the purge code, add some debug code which looks at the refcount
> value. If it is not between 1 and 5, prinkt() a warning what there is
> probably a missing refcount operation. Since the purge code does not
> run very often, it should not add much overhead, yet it is still a
> useful debug tool.
> 
> The following bit of code happens quite a lot:
> 
> while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
> 
> 
> There is also a comment about having to free hashit, if you don't
> iterate to the end of the hash. How about refactoring this, more like
> the linux list.h.
> 
> Make hashit a stack variable, with an init macro:
> 
> #define HASHIT(name) struct hash_it_t name = { .index = -1, .bucket = NULL, \
> 	                                       .prev_bucket=NULL,           \
>  					       .first_bucket=NULL }
> 
> and a macro for iterating over the hash
>     
>     HASHIT(hashit);
> 
>     orig_hash_for_each(orig_node, hashit) {
> 
>     foo(orig_node);
>     }
> 
> 
>     Andrew
> _______________________________________________
> B.A.T.M.A.N mailing list
> B.A.T.M.A.N@lists.open-mesh.net
> https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n
>
  

Patch

Index: batman-adv-kernelland/vis.c
===================================================================
--- batman-adv-kernelland/vis.c	(revision 1489)
+++ batman-adv-kernelland/vis.c	(working copy)
@@ -26,6 +26,7 @@ 
 #include "soft-interface.h"
 #include "hard-interface.h"
 #include "hash.h"
+#include "originator.h"
 #include "compat.h"
 
 struct hashtable_t *vis_hash;
@@ -262,16 +263,16 @@ 
 
 /* Walk the originators and find the VIS server with the best tq. Set the packet
  * address to its address and return the best_tq.
- *
- * Must be called with the originator hash locked */
+ */
 static int find_best_vis_server(struct vis_info *info)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 	int best_tq = -1;
 
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
+
+		spin_lock(&orig_node->lock);
 		if ((orig_node != NULL) &&
 		    (orig_node->router != NULL) &&
 		    (orig_node->flags & VIS_SERVER) &&
@@ -280,6 +281,7 @@ 
 			memcpy(info->packet.target_orig, orig_node->orig,
 			       ETH_ALEN);
 		}
+		spin_unlock(&orig_node->lock);
 	}
 	return best_tq;
 }
@@ -298,7 +300,7 @@ 
 static int generate_vis_packet(void)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 	struct vis_info *info = (struct vis_info *)my_vis_info;
 	struct vis_info_entry *entry, *entry_array;
 	struct hna_local_entry *hna_local_entry;
@@ -307,7 +309,6 @@ 
 
 	info->first_seen = jiffies;
 
-	spin_lock(&orig_hash_lock);
 	memcpy(info->packet.target_orig, broadcastAddr, ETH_ALEN);
 	info->packet.ttl = TTL;
 	info->packet.seqno++;
@@ -316,17 +317,17 @@ 
 	if (!is_vis_server_locked()) {
 		best_tq = find_best_vis_server(info);
 		if (best_tq < 0) {
-			spin_unlock(&orig_hash_lock);
 			return -1;
 		}
 	}
-	hashit = NULL;
 
 	entry_array = (struct vis_info_entry *)
 		((char *)info + sizeof(struct vis_info));
 
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
+
+		spin_lock(&orig_node->lock);
+
 		if (orig_node->router != NULL
 			&& compare_orig(orig_node->router->addr, orig_node->orig)
 			&& orig_node->batman_if
@@ -340,15 +341,16 @@ 
 			entry->quality = orig_node->router->tq_avg;
 			info->packet.entries++;
 
+			/* TODO: this is is a possible memory leak, 
+			 * hashit should be freed somewhere. */
 			if (vis_packet_full(info)) {
-				spin_unlock(&orig_hash_lock);
+				spin_unlock(&orig_node->lock);
+				orig_unreference(orig_node);
 				return 0;
 			}
 		}
 	}
 
-	spin_unlock(&orig_hash_lock);
-
 	hashit = NULL;
 	spin_lock_irqsave(&hna_local_hash_lock, flags);
 	while (NULL != (hashit = hash_iterate(hna_local_hash, hashit))) {
@@ -388,14 +390,12 @@ 
 static void broadcast_vis_packet(struct vis_info *info, int packet_length)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 
-	spin_lock(&orig_hash_lock);
-
 	/* send to all routers in range. */
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
 
+		spin_lock(&orig_node->lock);
 		/* if it's a vis server and reachable, send it. */
 		if (orig_node &&
 		    (orig_node->flags & VIS_SERVER) &&
@@ -415,27 +415,28 @@ 
 					orig_node->batman_if,
 					orig_node->router->addr);
 		}
+		spin_unlock(&orig_node->lock);
 	}
 	memcpy(info->packet.target_orig, broadcastAddr, ETH_ALEN);
-	spin_unlock(&orig_hash_lock);
 }
 
 static void unicast_vis_packet(struct vis_info *info, int packet_length)
 {
 	struct orig_node *orig_node;
 
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)
-		     hash_find(orig_hash, info->packet.target_orig));
+	orig_node = orig_find(info->packet.target_orig);
 
-	if ((orig_node != NULL) &&
-	    (orig_node->batman_if != NULL) &&
-	    (orig_node->router != NULL)) {
-		send_raw_packet((unsigned char *) &info->packet, packet_length,
+	if (orig_node) {
+		spin_lock(&orig_node->lock);
+	    if ((orig_node->batman_if != NULL) &&
+	    	(orig_node->router != NULL)) {
+			send_raw_packet((unsigned char *) &info->packet, packet_length,
 				orig_node->batman_if,
 				orig_node->router->addr);
+		}
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 	}
-	spin_unlock(&orig_hash_lock);
 }
 
 /* only send one vis packet. called from send_vis_packets() */
Index: batman-adv-kernelland/Makefile.kbuild
===================================================================
--- batman-adv-kernelland/Makefile.kbuild	(revision 1489)
+++ batman-adv-kernelland/Makefile.kbuild	(working copy)
@@ -25,7 +25,7 @@ 
 -include $(TOPDIR)/Rules.make
 endif
 
-# EXTRA_CFLAGS += -DCONFIG_BATMAN_DEBUG
+EXTRA_CFLAGS += -DCONFIG_BATMAN_DEBUG
 
 ifneq ($(REVISION),)
 EXTRA_CFLAGS += -DREVISION_VERSION=\"r$(REVISION)\"
Index: batman-adv-kernelland/types.h
===================================================================
--- batman-adv-kernelland/types.h	(revision 1489)
+++ batman-adv-kernelland/types.h	(working copy)
@@ -62,6 +62,8 @@ 
 	int16_t  hna_buff_len;
 	uint16_t last_real_seqno;   /* last and best known squence number */
 	uint8_t last_ttl;         /* ttl of last received packet */
+	spinlock_t lock;
+	atomic_t refcnt;
 	TYPE_OF_WORD bcast_bits[NUM_WORDS];
 	uint16_t last_bcast_seqno;  /* last broadcast sequence number received by this host */
 	struct list_head neigh_list;
Index: batman-adv-kernelland/translation-table.c
===================================================================
--- batman-adv-kernelland/translation-table.c	(revision 1489)
+++ batman-adv-kernelland/translation-table.c	(working copy)
@@ -24,6 +24,7 @@ 
 #include "soft-interface.h"
 #include "types.h"
 #include "hash.h"
+#include "originator.h"
 #include "compat.h"
 
 struct hashtable_t *hna_local_hash;
@@ -257,6 +258,9 @@ 
 	return 1;
 }
 
+
+/* adds an HNA entry to the global address table.
+ * orig_node's spinlock should be hold from outside */
 void hna_global_add_orig(struct orig_node *orig_node,
 			 unsigned char *hna_buff, int hna_buff_len)
 {
@@ -299,7 +303,7 @@ 
 
 		}
 
-		hna_global_entry->orig_node = orig_node;
+		hna_global_entry->orig_node = orig_reference(orig_node);
 		spin_unlock_irqrestore(&hna_global_hash_lock, flags);
 
 		/* remove address from local hash if present */
@@ -391,6 +395,8 @@ 
 		hna_str, orig_str, message);
 
 	hash_remove(hna_global_hash, hna_global_entry->addr);
+	orig_unreference(hna_global_entry->orig_node);
+
 	kfree(hna_global_entry);
 }
 
@@ -452,5 +458,5 @@ 
 	if (hna_global_entry == NULL)
 		return NULL;
 
-	return hna_global_entry->orig_node;
+	return orig_reference(hna_global_entry->orig_node);
 }
Index: batman-adv-kernelland/send.c
===================================================================
--- batman-adv-kernelland/send.c	(revision 1489)
+++ batman-adv-kernelland/send.c	(working copy)
@@ -294,6 +294,7 @@ 
 	batman_packet->ttl--;
 	memcpy(batman_packet->prev_sender, ethhdr->h_source, ETH_ALEN);
 
+	spin_lock(&orig_node->lock);
 	/* rebroadcast tq of our best ranking neighbor to ensure the rebroadcast
 	 * of our best tq value */
 	if ((orig_node->router) && (orig_node->router->tq_avg != 0)) {
@@ -308,6 +309,7 @@ 
 
 		tq_avg = orig_node->router->tq_avg;
 	}
+	spin_unlock(&orig_node->lock);
 
 	/* apply hop penalty */
 	batman_packet->tq = hop_penalty(batman_packet->tq);
Index: batman-adv-kernelland/device.c
===================================================================
--- batman-adv-kernelland/device.c	(revision 1489)
+++ batman-adv-kernelland/device.c	(working copy)
@@ -24,6 +24,7 @@ 
 #include "send.h"
 #include "types.h"
 #include "hash.h"
+#include "originator.h"
 
 #include "compat.h"
 
@@ -205,6 +206,7 @@ 
 	struct icmp_packet icmp_packet;
 	struct orig_node *orig_node;
 	struct batman_if *batman_if;
+	uint8_t router_addr[ETH_ALEN];
 
 	if (len < sizeof(struct icmp_packet)) {
 		printk(KERN_DEBUG "batman-adv:Error - can't send packet from char device: invalid packet size\n");
@@ -239,15 +241,20 @@ 
 	if (atomic_read(&module_state) != MODULE_ACTIVE)
 		goto dst_unreach;
 
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)hash_find(orig_hash, icmp_packet.dst));
+	orig_node = orig_find(icmp_packet.dst);
 
 	if (!orig_node)
-		goto unlock;
+		goto dst_unreach;
 
+	spin_lock(&orig_node->lock);
+
 	if (!orig_node->router)
 		goto unlock;
 
+	memcpy(router_addr,
+	       orig_node->router->addr,
+	       ETH_ALEN);
+
 	batman_if = orig_node->batman_if;
 
 	if (!batman_if)
@@ -257,15 +264,18 @@ 
 	       batman_if->net_dev->dev_addr,
 	       ETH_ALEN);
 
+	spin_unlock(&orig_node->lock);
+	orig_unreference(orig_node);
+
 	send_raw_packet((unsigned char *)&icmp_packet,
 			sizeof(struct icmp_packet),
-			batman_if, orig_node->router->addr);
+			batman_if, router_addr);
 
-	spin_unlock(&orig_hash_lock);
 	goto out;
 
 unlock:
-	spin_unlock(&orig_hash_lock);
+	spin_unlock(&orig_node->lock);
+	orig_unreference(orig_node);
 dst_unreach:
 	icmp_packet.msg_type = DESTINATION_UNREACHABLE;
 	bat_device_add_packet(device_client, &icmp_packet);
Index: batman-adv-kernelland/proc.c
===================================================================
--- batman-adv-kernelland/proc.c	(revision 1489)
+++ batman-adv-kernelland/proc.c	(working copy)
@@ -28,6 +28,7 @@ 
 #include "hash.h"
 #include "vis.h"
 #include "compat.h"
+#include "originator.h"
 
 static uint8_t vis_format = DOT_DRAW;
 
@@ -187,7 +188,7 @@ 
 static int proc_originators_read(struct seq_file *seq, void *offset)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 	struct neigh_node *neigh_node;
 	int batman_count = 0;
 	char orig_str[ETH_STR_LEN], router_str[ETH_STR_LEN];
@@ -213,11 +214,10 @@ 
 		   ((struct batman_if *)if_list.next)->addr_str);
 
 	rcu_read_unlock();
-	spin_lock(&orig_hash_lock);
 
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
 
-		orig_node = hashit->bucket->data;
+		spin_lock(&orig_node->lock);
 
 		if (!orig_node->router)
 			continue;
@@ -241,11 +241,10 @@ 
 		}
 
 		seq_printf(seq, "\n");
+		spin_unlock(&orig_node->lock);
 
 	}
 
-	spin_unlock(&orig_hash_lock);
-
 	if (batman_count == 0)
 		seq_printf(seq, "No batman nodes in range ... \n");
 
Index: batman-adv-kernelland/soft-interface.c
===================================================================
--- batman-adv-kernelland/soft-interface.c	(revision 1489)
+++ batman-adv-kernelland/soft-interface.c	(working copy)
@@ -26,6 +26,7 @@ 
 #include "translation-table.h"
 #include "types.h"
 #include "hash.h"
+#include "originator.h"
 #include <linux/ethtool.h>
 #include <linux/etherdevice.h>
 #include "compat.h"
@@ -34,7 +35,6 @@ 
 				  * broadcast storms */
 static int32_t skb_packets;
 static int32_t skb_bad_packets;
-static int32_t lock_dropped;
 
 unsigned char mainIfAddr[ETH_ALEN];
 static unsigned char mainIfAddr_default[ETH_ALEN];
@@ -205,60 +205,55 @@ 
 	/* unicast packet */
 	} else {
 
-		/* simply spin_lock()ing can deadlock when the lock is already
-		 * hold. */
-		/* TODO: defer the work in a working queue instead of
-		 * dropping */
-		if (!spin_trylock(&orig_hash_lock)) {
-			lock_dropped++;
-			printk(KERN_WARNING "batman-adv:%d packets dropped because lock was hold\n", lock_dropped);
-			goto dropped;
-		}
-
 		/* get routing information */
-		orig_node = ((struct orig_node *)hash_find(orig_hash,
-							   ethhdr->h_dest));
+		orig_node = orig_find(ethhdr->h_dest);
 
 		/* check for hna host */
 		if (!orig_node)
 			orig_node = transtable_search(ethhdr->h_dest);
 
-		if ((orig_node) &&
-		    (orig_node->batman_if) &&
-		    (orig_node->router)) {
-			if (my_skb_push(skb, sizeof(struct unicast_packet)) < 0)
-				goto unlock;
+		if (!orig_node) 
+			goto dropped;
 
-			unicast_packet = (struct unicast_packet *)skb->data;
+		spin_lock(&orig_node->lock);
 
-			unicast_packet->version = COMPAT_VERSION;
-			/* batman packet type: unicast */
-			unicast_packet->packet_type = BAT_UNICAST;
-			/* set unicast ttl */
-			unicast_packet->ttl = TTL;
-			/* copy the destination for faster routing */
-			memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN);
+	    if (!((orig_node->batman_if) &&
+		    (orig_node->router))) 
+			goto unlock;
 
-			/* net_dev won't be available when not active */
-			if (orig_node->batman_if->if_active != IF_ACTIVE)
-				goto unlock;
+		if (my_skb_push(skb, sizeof(struct unicast_packet)) < 0) 
+			goto unlock;
 
-			send_raw_packet(skb->data, skb->len,
-					orig_node->batman_if,
-					orig_node->router->addr);
-		} else {
+		unicast_packet = (struct unicast_packet *)skb->data;
+
+		unicast_packet->version = COMPAT_VERSION;
+		/* batman packet type: unicast */
+		unicast_packet->packet_type = BAT_UNICAST;
+		/* set unicast ttl */
+		unicast_packet->ttl = TTL;
+		/* copy the destination for faster routing */
+		memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN);
+
+		/* net_dev won't be available when not active */
+		if (orig_node->batman_if->if_active != IF_ACTIVE) 
 			goto unlock;
-		}
+		
+		send_raw_packet(skb->data, skb->len,
+			orig_node->batman_if,
+			orig_node->router->addr);
 
-		spin_unlock(&orig_hash_lock);
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 	}
 
 	priv->stats.tx_packets++;
 	priv->stats.tx_bytes += data_len;
 	goto end;
 
+
 unlock:
-	spin_unlock(&orig_hash_lock);
+	spin_unlock(&orig_node->lock);
+	orig_unreference(orig_node);
 dropped:
 	priv->stats.tx_dropped++;
 end:
Index: batman-adv-kernelland/hard-interface.c
===================================================================
--- batman-adv-kernelland/hard-interface.c	(revision 1489)
+++ batman-adv-kernelland/hard-interface.c	(working copy)
@@ -26,6 +26,7 @@ 
 #include "translation-table.h"
 #include "routing.h"
 #include "hash.h"
+#include "originator.h"
 #include "compat.h"
 
 #define MIN(x, y) ((x) < (y) ? (x) : (y))
@@ -317,8 +318,9 @@ 
 {
 	struct batman_if *batman_if;
 	struct batman_packet *batman_packet;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 	struct hash_it_t *hashit = NULL;
+	int ret;
 
 	batman_if = kmalloc(sizeof(struct batman_if), GFP_KERNEL);
 
@@ -375,18 +377,18 @@ 
 
 	/* resize all orig nodes because orig_node->bcast_own(_sum) depend on
 	 * if_num */
-	spin_lock(&orig_hash_lock);
 
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
-		if (resize_orig(orig_node, if_num) == -1) {
-			spin_unlock(&orig_hash_lock);
-			goto out;
-		}
+	ret = 0;
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
+		ret |= resize_orig(orig_node, if_num);
 	}
 
-	spin_unlock(&orig_hash_lock);
+	if (ret) {
+		printk(KERN_ERR "batman-adv:Resizing originators for %s failed.\n", batman_if->dev);
+		goto out;
+	}
 
+
 	if (!hardif_is_interface_up(batman_if->dev))
 		printk(KERN_ERR "batman-adv:Not using interface %s (retrying later): interface not active\n", batman_if->dev);
 	else
Index: batman-adv-kernelland/originator.c
===================================================================
--- batman-adv-kernelland/originator.c	(revision 1489)
+++ batman-adv-kernelland/originator.c	(working copy)
@@ -19,17 +19,86 @@ 
  *
  */
 
-/* increase the reference counter for this originator */
-
 #include "main.h"
+#include "hash.h"
 #include "originator.h"
-#include "hash.h"
 #include "translation-table.h"
 #include "routing.h"
 
 
+DEFINE_SPINLOCK(orig_hash_lock);
 static DECLARE_DELAYED_WORK(purge_orig_wq, purge_orig);
 
+/* increase the reference counter for this originator */
+
+struct orig_node *orig_reference(struct orig_node *orig_node)
+{
+	if (orig_node == NULL) {
+		printk(KERN_ERR "batman-adv: tried to reference a NULL orig\n");
+		dump_stack();
+		return NULL;
+	} else if (atomic_read(&orig_node->refcnt) < 1) {
+		printk(KERN_DEBUG "batman-adv: trying to access a vanished node with refcnt %d\n", atomic_read(&orig_node->refcnt));
+		return NULL;
+	} else {
+		atomic_inc(&orig_node->refcnt);
+
+		return orig_node;
+	}
+}
+
+static void __free_orig_node(void *data)
+{
+	struct orig_node *orig_node = (struct orig_node *)data;
+
+	hna_global_del_orig(orig_node, "__free_orig_node");
+
+	kfree(orig_node->bcast_own);
+	kfree(orig_node->bcast_own_sum);
+	kfree(orig_node);
+}
+
+
+/* decrease the reference counter for this originator */
+struct orig_node *orig_unreference(struct orig_node *orig_node)
+{
+	char orig_str[ETH_STR_LEN];
+
+	if (orig_node == NULL) {
+		printk(KERN_ERR "batman-adv: tried to reference a NULL orig\n");
+		dump_stack();
+		return NULL;
+	} else if (atomic_read(&orig_node->refcnt) < 1) {
+		printk(KERN_DEBUG "batman-adv: trying to access a vanished node with refcnt %d\n", atomic_read(&orig_node->refcnt));
+		return NULL;
+	} else {
+
+		if (atomic_dec_and_test(&orig_node->refcnt)) {
+			addr_to_string(orig_str, orig_node->orig);
+			bat_dbg(DBG_BATMAN,"%s:decreasing refcnt of orig %s to %d, time to free\n", __func__, orig_str, atomic_read(&orig_node->refcnt));
+			__free_orig_node(orig_node);
+		}
+		return orig_node;
+	}
+}
+
+
+/* find an orig node and increase the reference counter */
+
+struct orig_node *orig_find(char *mac)
+{
+	struct orig_node *orig_node;
+
+	spin_lock(&orig_hash_lock);
+	orig_node = ((struct orig_node *)hash_find(orig_hash, mac));
+	spin_unlock(&orig_hash_lock);
+
+	if (orig_node)
+		orig_node = orig_reference(orig_node);
+
+	return orig_node;
+}
+
 static void start_purge_timer(void)
 {
 	queue_delayed_work(bat_event_workqueue, &purge_orig_wq, 1 * HZ);
@@ -64,6 +133,7 @@ 
 
 	spin_lock(&orig_hash_lock);
 	hash_delete(orig_hash, free_orig_node);
+
 	orig_hash = NULL;
 	spin_unlock(&orig_hash_lock);
 }
@@ -81,7 +151,7 @@ 
 	INIT_LIST_HEAD(&neigh_node->list);
 
 	memcpy(neigh_node->addr, neigh, ETH_ALEN);
-	neigh_node->orig_node = orig_neigh_node;
+	neigh_node->orig_node = orig_reference(orig_neigh_node);
 	neigh_node->if_incoming = if_incoming;
 
 	list_add_tail(&neigh_node->list, &orig_node->neigh_list);
@@ -90,39 +160,35 @@ 
 
 void free_orig_node(void *data)
 {
+	struct orig_node *orig_node = (struct orig_node *) data;
 	struct list_head *list_pos, *list_pos_tmp;
 	struct neigh_node *neigh_node;
-	struct orig_node *orig_node = (struct orig_node *)data;
 
+	hna_global_del_orig(orig_node, "originator timeout");
+
+	spin_lock(&orig_node->lock);
 	/* for all neighbours towards this originator ... */
 	list_for_each_safe(list_pos, list_pos_tmp, &orig_node->neigh_list) {
 		neigh_node = list_entry(list_pos, struct neigh_node, list);
 
+		orig_unreference(neigh_node->orig_node);
+
 		list_del(list_pos);
 		kfree(neigh_node);
 	}
+	orig_node->router = NULL;
+	spin_unlock(&orig_node->lock);
 
-	hna_global_del_orig(orig_node, "originator timed out");
-
-	kfree(orig_node->bcast_own);
-	kfree(orig_node->bcast_own_sum);
-	kfree(orig_node);
+	orig_unreference(orig_node);
 }
 
-/* this function finds or creates an originator entry for the given
- * address if it does not exits */
-struct orig_node *get_orig_node(uint8_t *addr)
+struct orig_node *orig_node_create(uint8_t *addr)
 {
 	struct orig_node *orig_node;
 	struct hashtable_t *swaphash;
 	char orig_str[ETH_STR_LEN];
 	int size;
 
-	orig_node = ((struct orig_node *)hash_find(orig_hash, addr));
-
-	if (orig_node != NULL)
-		return orig_node;
-
 	addr_to_string(orig_str, addr);
 	bat_dbg(DBG_BATMAN, "Creating new originator: %s \n", orig_str);
 
@@ -131,9 +197,12 @@ 
 	INIT_LIST_HEAD(&orig_node->neigh_list);
 
 	memcpy(orig_node->orig, addr, ETH_ALEN);
+
+	orig_node->lock = __SPIN_LOCK_UNLOCKED(device_client->lock);
 	orig_node->router = NULL;
 	orig_node->batman_if = NULL;
 	orig_node->hna_buff = NULL;
+	atomic_set(&orig_node->refcnt, 1);
 
 	size = num_ifs * sizeof(TYPE_OF_WORD) * NUM_WORDS;
 
@@ -155,7 +224,24 @@ 
 		else
 			orig_hash = swaphash;
 	}
+	return orig_node;
+}
 
+/* this function finds or creates an originator entry for the given
+ * address if it does not exits */
+struct orig_node *get_orig_node(uint8_t *addr)
+{
+	struct orig_node *orig_node;
+
+	orig_node = orig_find(addr);
+
+	if (orig_node)
+		return orig_node;
+
+	orig_node = orig_node_create(addr);
+	if (orig_node)
+		orig_node = orig_reference(orig_node_create(addr));
+
 	return orig_node;
 }
 
@@ -164,6 +250,7 @@ 
 {
 	struct list_head *list_pos, *list_pos_tmp;
 	char neigh_str[ETH_STR_LEN];
+	char orig_str[ETH_STR_LEN];
 	struct neigh_node *neigh_node;
 	bool neigh_purged = false;
 
@@ -178,10 +265,14 @@ 
 			       (neigh_node->last_valid +
 				((PURGE_TIMEOUT * HZ) / 1000)))) {
 
+			addr_to_string(orig_str, orig_node->orig);
 			addr_to_string(neigh_str, neigh_node->addr);
 			bat_dbg(DBG_BATMAN, "Neighbour timeout: originator %s, neighbour: %s, last_valid %lu\n", orig_str, neigh_str, (neigh_node->last_valid / HZ));
+			neigh_purged = true;
+			if (neigh_node == orig_node->router)
+				orig_node->router = NULL;
 
-			neigh_purged = true;
+			orig_unreference(neigh_node->orig_node);
 			list_del(list_pos);
 			kfree(neigh_node);
 		} else {
@@ -201,10 +292,10 @@ 
 
 	addr_to_string(orig_str, orig_node->orig);
 
+	spin_lock(&orig_node->lock);
 	if (time_after(jiffies,
 		       (orig_node->last_valid +
 			((2 * PURGE_TIMEOUT * HZ) / 1000)))) {
-
 		bat_dbg(DBG_BATMAN,
 			"Originator timeout: originator %s, last_valid %lu\n",
 			orig_str, (orig_node->last_valid / HZ));
@@ -215,28 +306,48 @@ 
 				      orig_node->hna_buff,
 				      orig_node->hna_buff_len);
 	}
+	spin_unlock(&orig_node->lock);
 	return false;
 }
 
 void purge_orig(struct work_struct *work)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 
-	spin_lock(&orig_hash_lock);
 
 	/* for all origins... */
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
+
 		if (purge_orig_node(orig_node)) {
 			hash_remove_bucket(orig_hash, hashit);
 			free_orig_node(orig_node);
 		}
 	}
 
-	spin_unlock(&orig_hash_lock);
-
 	start_purge_timer();
 }
 
 
+/* iterates over the originator hash, automatically referencing acquired orig_nodes.
+ * It also unreferences the last node if it is given.
+ *
+ * The calling function does not have to handle the orig_hash locks as this function
+ * does it. */
+struct orig_node *orig_hash_iterate(struct hash_it_t **hashit, struct orig_node *orig_node)
+{
+	if (orig_node)
+		orig_unreference(orig_node);
+
+	spin_lock(&orig_hash_lock);
+
+	*hashit = hash_iterate(orig_hash, *hashit);
+	if (*hashit)
+		orig_node = orig_reference((*hashit)->bucket->data);
+	else 
+		orig_node = NULL;
+
+	spin_unlock(&orig_hash_lock);
+
+	return orig_node;
+}
Index: batman-adv-kernelland/originator.h
===================================================================
--- batman-adv-kernelland/originator.h	(revision 1489)
+++ batman-adv-kernelland/originator.h	(working copy)
@@ -19,6 +19,8 @@ 
  *
  */
 
+extern spinlock_t orig_hash_lock;
+
 int originator_init(void);
 void free_orig_node(void *data);
 void originator_free(void);
@@ -28,4 +30,8 @@ 
 struct neigh_node *
 create_neighbor(struct orig_node *orig_node, struct orig_node *orig_neigh_node,
 		uint8_t *neigh, struct batman_if *if_incoming);
+struct orig_node *orig_reference(struct orig_node *orig);
+struct orig_node *orig_unreference(struct orig_node *orig);
+struct orig_node *orig_find(char *mac);
+struct orig_node *orig_hash_iterate(struct hash_it_t **hashit, struct orig_node *orig_node);
 
Index: batman-adv-kernelland/main.c
===================================================================
--- batman-adv-kernelland/main.c	(revision 1489)
+++ batman-adv-kernelland/main.c	(working copy)
@@ -23,7 +23,6 @@ 
 #include "proc.h"
 #include "routing.h"
 #include "send.h"
-#include "originator.h"
 #include "soft-interface.h"
 #include "device.h"
 #include "translation-table.h"
@@ -31,6 +30,7 @@ 
 #include "types.h"
 #include "vis.h"
 #include "hash.h"
+#include "originator.h"
 #include "compat.h"
 
 struct list_head if_list;
@@ -38,7 +38,6 @@ 
 struct hlist_head forw_bcast_list;
 struct hashtable_t *orig_hash;
 
-DEFINE_SPINLOCK(orig_hash_lock);
 DEFINE_SPINLOCK(forw_bat_list_lock);
 DEFINE_SPINLOCK(forw_bcast_list_lock);
 
Index: batman-adv-kernelland/routing.c
===================================================================
--- batman-adv-kernelland/routing.c	(revision 1489)
+++ batman-adv-kernelland/routing.c	(working copy)
@@ -38,24 +38,21 @@ 
 
 static atomic_t data_ready_cond;
 atomic_t exit_cond;
+
 void slide_own_bcast_window(struct batman_if *batman_if)
 {
 	struct hash_it_t *hashit = NULL;
-	struct orig_node *orig_node;
+	struct orig_node *orig_node = NULL;
 	TYPE_OF_WORD *word;
 
-	spin_lock(&orig_hash_lock);
+	while (NULL != (orig_node = orig_hash_iterate(&hashit, orig_node))) {
 
-	while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-		orig_node = hashit->bucket->data;
 		word = &(orig_node->bcast_own[batman_if->if_num * NUM_WORDS]);
 
 		bit_get_packet(word, 1, 0);
 		orig_node->bcast_own_sum[batman_if->if_num] =
 			bit_packet_count(word);
 	}
-
-	spin_unlock(&orig_hash_lock);
 }
 
 static void update_HNA(struct orig_node *orig_node,
@@ -131,6 +128,8 @@ 
 		update_HNA(orig_node, hna_buff, hna_buff_len);
 }
 
+/* TODO: rename and comment this function */
+/* TODO: review referencing here */
 static int isBidirectionalNeigh(struct orig_node *orig_node,
 				struct orig_node *orig_neigh_node,
 				struct batman_packet *batman_packet,
@@ -144,25 +143,29 @@ 
 	addr_to_string(neigh_str, orig_neigh_node->orig);
 
 	if (orig_node == orig_neigh_node) {
+		spin_lock(&orig_node->lock);
 		list_for_each_entry(tmp_neigh_node,
 				    &orig_node->neigh_list,
 				    list) {
 
 			if (compare_orig(tmp_neigh_node->addr,
-					 orig_neigh_node->orig) &&
+					 orig_node->orig) &&
 			    (tmp_neigh_node->if_incoming == if_incoming))
 				neigh_node = tmp_neigh_node;
 		}
 
 		if (neigh_node == NULL)
 			neigh_node = create_neighbor(orig_node,
-						     orig_neigh_node,
-						     orig_neigh_node->orig,
+						     orig_node,
+						     orig_node->orig,
 						     if_incoming);
 
 		neigh_node->last_valid = jiffies;
+		spin_unlock(&orig_node->lock);
 	} else {
 		/* find packet count of corresponding one hop neighbor */
+		spin_lock(&orig_neigh_node->lock);
+
 		list_for_each_entry(tmp_neigh_node,
 				    &orig_neigh_node->neigh_list, list) {
 
@@ -177,6 +180,7 @@ 
 						     orig_neigh_node,
 						     orig_neigh_node->orig,
 						     if_incoming);
+		spin_unlock(&orig_neigh_node->lock);
 	}
 
 	orig_node->last_valid = jiffies;
@@ -239,6 +243,7 @@ 
 			unsigned char *hna_buff, int hna_buff_len,
 			char is_duplicate)
 {
+	struct orig_node *source_orig_node;
 	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL;
 	int tmp_hna_buff_len;
 
@@ -260,11 +265,13 @@ 
 			ring_buffer_avg(tmp_neigh_node->tq_recv);
 	}
 
-	if (neigh_node == NULL)
+	if (neigh_node == NULL) {
+		source_orig_node = get_orig_node(ethhdr->h_source);
 		neigh_node = create_neighbor(orig_node,
-					     get_orig_node(ethhdr->h_source),
+					     source_orig_node,
 					     ethhdr->h_source, if_incoming);
-	else
+		orig_unreference(source_orig_node);
+	} else
 		bat_dbg(DBG_BATMAN,
 			"Updating existing last-hop neighbour of originator\n");
 
@@ -323,6 +330,7 @@ 
 	if (orig_node == NULL)
 		return 0;
 
+	spin_lock(&orig_node->lock);
 	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
 
 		if (!is_duplicate)
@@ -346,10 +354,13 @@ 
 			orig_node->last_real_seqno, batman_packet->seqno);
 		orig_node->last_real_seqno = batman_packet->seqno;
 	}
+	spin_unlock(&orig_node->lock);
+	orig_unreference(orig_node);
 
 	return is_duplicate;
 }
 
+/* receive and handle one batman packet */
 void receive_bat_packet(struct ethhdr *ethhdr,
 			struct batman_packet *batman_packet,
 			unsigned char *hna_buff,
@@ -448,10 +459,13 @@ 
 		 * come via the corresponding interface */
 		/* if received seqno equals last send seqno save new
 		 * seqno for bidirectional check */
+
+		spin_lock(&orig_neigh_node->lock);
 		if (has_directlink_flag &&
 		    compare_orig(if_incoming->net_dev->dev_addr,
 				 batman_packet->orig) &&
 		    (batman_packet->seqno - if_incoming_seqno + 2 == 0)) {
+
 			offset = if_incoming->if_num * NUM_WORDS;
 			word = &(orig_neigh_node->bcast_own[offset]);
 			bit_mark(word, 0);
@@ -459,6 +473,9 @@ 
 				bit_packet_count(word);
 		}
 
+		spin_unlock(&orig_neigh_node->lock);
+		orig_unreference(orig_neigh_node);
+
 		bat_dbg(DBG_BATMAN, "Drop packet: originator packet from myself (via neighbour) \n");
 		return;
 	}
@@ -481,6 +498,8 @@ 
 	if (orig_node == NULL)
 		return;
 
+	spin_lock(&orig_node->lock);
+
 	/* avoid temporary routing loops */
 	if ((orig_node->router) &&
 	    (orig_node->router->orig_node->router) &&
@@ -489,36 +508,49 @@ 
 	    !(compare_orig(batman_packet->orig, batman_packet->prev_sender)) &&
 	    (compare_orig(orig_node->router->addr,
 			  orig_node->router->orig_node->router->addr))) {
+
 		bat_dbg(DBG_BATMAN, "Drop packet: ignoring all rebroadcast packets that may make me loop (sender: %s) \n", neigh_str);
+
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 		return;
 	}
+	spin_unlock(&orig_node->lock);
 
 	/* if sender is a direct neighbor the sender mac equals
 	 * originator mac */
 	orig_neigh_node = (is_single_hop_neigh ?
-			   orig_node : get_orig_node(ethhdr->h_source));
-	if (orig_neigh_node == NULL)
+			   orig_reference(orig_node) : get_orig_node(ethhdr->h_source));
+	if (orig_neigh_node == NULL) {
+		orig_unreference(orig_node);
 		return;
+	}
 
 	/* drop packet if sender is not a direct neighbor and if we
 	 * don't route towards it */
 	if (!is_single_hop_neigh &&
 	    (orig_neigh_node->router == NULL)) {
 		bat_dbg(DBG_BATMAN, "Drop packet: OGM via unknown neighbor!\n");
+
+		orig_unreference(orig_node);
+		orig_unreference(orig_neigh_node);
 		return;
 	}
 
 	is_bidirectional = isBidirectionalNeigh(orig_node, orig_neigh_node,
 						batman_packet, if_incoming);
+	orig_unreference(orig_neigh_node);
 
 	/* update ranking if it is not a duplicate or has the same
 	 * seqno and similar ttl as the non-duplicate */
+	spin_lock(&orig_node->lock);
 	if (is_bidirectional &&
 	    (!is_duplicate ||
 	     ((orig_node->last_real_seqno == batman_packet->seqno) &&
 	      (orig_node->last_ttl - 3 <= batman_packet->ttl))))
 		update_orig(orig_node, ethhdr, batman_packet,
 			    if_incoming, hna_buff, hna_buff_len, is_duplicate);
+	spin_unlock(&orig_node->lock);
 
 	/* is single hop (direct) neighbour */
 	if (is_single_hop_neigh) {
@@ -528,6 +560,7 @@ 
 					1, hna_buff_len, if_incoming);
 
 		bat_dbg(DBG_BATMAN, "Forwarding packet: rebroadcast neighbour packet with direct link flag\n");
+		orig_unreference(orig_node);
 		return;
 	}
 
@@ -535,11 +568,13 @@ 
 	if (!is_bidirectional) {
 		bat_dbg(DBG_BATMAN,
 			"Drop packet: not received via bidirectional link\n");
+		orig_unreference(orig_node);
 		return;
 	}
 
 	if (is_duplicate) {
 		bat_dbg(DBG_BATMAN, "Drop packet: duplicate packet received\n");
+		orig_unreference(orig_node);
 		return;
 	}
 
@@ -547,6 +582,7 @@ 
 		"Forwarding packet: rebroadcast originator packet\n");
 	schedule_forward_packet(orig_node, ethhdr, batman_packet,
 				0, hna_buff_len, if_incoming);
+	orig_unreference(orig_node);
 }
 
 
@@ -585,12 +621,10 @@ 
 	if (result < sizeof(struct ethhdr) + sizeof(struct batman_packet))
 		return;
 
-	spin_lock(&orig_hash_lock);
 	receive_aggr_bat_packet(ethhdr,
 				packet_buff + sizeof(struct ethhdr),
 				result - sizeof(struct ethhdr),
 				batman_if);
-	spin_unlock(&orig_hash_lock);
 }
 
 static void recv_my_icmp_packet(struct ethhdr *ethhdr,
@@ -608,25 +642,28 @@ 
 
 	/* answer echo request (ping) */
 	/* get routing information */
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)hash_find(orig_hash,
-						   icmp_packet->orig));
+	orig_node = orig_find(icmp_packet->orig);
 
-	if ((orig_node != NULL) &&
-	    (orig_node->batman_if != NULL) &&
-	    (orig_node->router != NULL)) {
-		memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
-		memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
-		icmp_packet->msg_type = ECHO_REPLY;
-		icmp_packet->ttl = TTL;
+	if (orig_node) {
+		spin_lock(&orig_node->lock);
 
-		send_raw_packet(packet_buff + sizeof(struct ethhdr),
-				result - sizeof(struct ethhdr),
-				orig_node->batman_if,
-				orig_node->router->addr);
+		if ((orig_node->batman_if != NULL) &&
+		    (orig_node->router != NULL)) {
+			memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
+			memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
+			icmp_packet->msg_type = ECHO_REPLY;
+			icmp_packet->ttl = TTL;
+	
+			send_raw_packet(packet_buff + sizeof(struct ethhdr),
+					result - sizeof(struct ethhdr),
+					orig_node->batman_if,
+					orig_node->router->addr);
+		}
+
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 	}
 
-	spin_unlock(&orig_hash_lock);
 	return;
 }
 
@@ -642,33 +679,32 @@ 
 	addr_to_string(src_str, icmp_packet->orig);
 	addr_to_string(dst_str, icmp_packet->dst);
 
-	printk(KERN_WARNING "batman-adv:Warning - can't send packet from %s to %s: ttl exceeded\n", src_str, dst_str);
-
 	/* send TTL exceeded if packet is an echo request (traceroute) */
 	if (icmp_packet->msg_type != ECHO_REQUEST)
 		return;
 
 	/* get routing information */
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)
-		     hash_find(orig_hash, icmp_packet->orig));
+	orig_node = orig_find(icmp_packet->orig);
 
-	if ((orig_node != NULL) &&
-	    (orig_node->batman_if != NULL) &&
-	    (orig_node->router != NULL)) {
-		memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
-		memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
-		icmp_packet->msg_type = TTL_EXCEEDED;
-		icmp_packet->ttl = TTL;
+	if (orig_node) {
+		spin_lock(&orig_node->lock);
 
-		send_raw_packet(packet_buff + sizeof(struct ethhdr),
-				result - sizeof(struct ethhdr),
-				orig_node->batman_if,
-				orig_node->router->addr);
+		if ((orig_node->batman_if != NULL) &&
+		    (orig_node->router != NULL)) {
+			memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
+			memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
+			icmp_packet->msg_type = TTL_EXCEEDED;
+			icmp_packet->ttl = TTL;
 
+			send_raw_packet(packet_buff + sizeof(struct ethhdr),
+					result - sizeof(struct ethhdr),
+					orig_node->batman_if,
+					orig_node->router->addr);
+		}
+
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 	}
-
-	spin_unlock(&orig_hash_lock);
 }
 
 
@@ -713,24 +749,27 @@ 
 	}
 
 	/* get routing information */
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)
-		     hash_find(orig_hash, icmp_packet->dst));
+	orig_node = orig_find(icmp_packet->dst);
 
-	if ((orig_node != NULL) &&
-	    (orig_node->batman_if != NULL) &&
-	    (orig_node->router != NULL)) {
+	if (orig_node) {
+		spin_lock(&orig_node->lock);
 
-		/* decrement ttl */
-		icmp_packet->ttl--;
+	    if ((orig_node->batman_if != NULL) &&
+		    (orig_node->router != NULL)) {
 
-		/* route it */
-		send_raw_packet(packet_buff + sizeof(struct ethhdr),
-				result - sizeof(struct ethhdr),
-				orig_node->batman_if,
-				orig_node->router->addr);
+			/* decrement ttl */
+			icmp_packet->ttl--;
+
+			/* route it */
+			send_raw_packet(packet_buff + sizeof(struct ethhdr),
+					result - sizeof(struct ethhdr),
+					orig_node->batman_if,
+					orig_node->router->addr);
+		}
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
+
 	}
-	spin_unlock(&orig_hash_lock);
 }
 
 static void recv_unicast_packet(struct ethhdr *ethhdr,
@@ -781,23 +820,26 @@ 
 	}
 
 	/* get routing information */
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)
-		     hash_find(orig_hash, unicast_packet->dest));
+	orig_node = orig_find(unicast_packet->dest);
 
-	if ((orig_node != NULL) &&
-	    (orig_node->batman_if != NULL) &&
-	    (orig_node->router != NULL)) {
-		/* decrement ttl */
-		unicast_packet->ttl--;
+	if (orig_node) {
+		spin_lock(&orig_node->lock);
 
-		/* route it */
-		send_raw_packet(packet_buff + sizeof(struct ethhdr),
-				result - sizeof(struct ethhdr),
-				orig_node->batman_if,
-				orig_node->router->addr);
+	    if ((orig_node->batman_if != NULL) &&
+		    (orig_node->router != NULL)) {
+			/* decrement ttl */
+			unicast_packet->ttl--;
+	
+			/* route it */
+			send_raw_packet(packet_buff + sizeof(struct ethhdr),
+					result - sizeof(struct ethhdr),
+					orig_node->batman_if,
+					orig_node->router->addr);
+		}
+
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 	}
-	spin_unlock(&orig_hash_lock);
 }
 
 
@@ -833,20 +875,19 @@ 
 	if (is_my_mac(bcast_packet->orig))
 		return;
 
-	spin_lock(&orig_hash_lock);
-	orig_node = ((struct orig_node *)
-		     hash_find(orig_hash, bcast_packet->orig));
+	orig_node = orig_find(bcast_packet->orig);
 
-	if (orig_node == NULL) {
-		spin_unlock(&orig_hash_lock);
+	if (!orig_node)
 		return;
-	}
 
+	spin_lock(&orig_node->lock);
+
 	/* check flood history */
 	if (get_bit_status(orig_node->bcast_bits,
 			   orig_node->last_bcast_seqno,
 			   ntohs(bcast_packet->seqno))) {
-		spin_unlock(&orig_hash_lock);
+		spin_unlock(&orig_node->lock);
+		orig_unreference(orig_node);
 		return;
 	}
 
@@ -856,7 +897,8 @@ 
 			   orig_node->last_bcast_seqno, 1))
 		orig_node->last_bcast_seqno = ntohs(bcast_packet->seqno);
 
-	spin_unlock(&orig_hash_lock);
+	spin_unlock(&orig_node->lock);
+	orig_unreference(orig_node);
 
 	/* broadcast for me */
 	interface_rx(soft_device, packet_buff + hdr_size, result - hdr_size);
Index: batman-adv-kernelland/main.h
===================================================================
--- batman-adv-kernelland/main.h	(revision 1489)
+++ batman-adv-kernelland/main.h	(working copy)
@@ -124,7 +124,6 @@ 
 extern struct hlist_head forw_bcast_list;
 extern struct hashtable_t *orig_hash;
 
-extern spinlock_t orig_hash_lock;
 extern spinlock_t forw_bat_list_lock;
 extern spinlock_t forw_bcast_list_lock;