• Steven Rostedt's avatar
    slub: Check for page NULL before doing the node_match check · c25f195e
    Steven Rostedt authored
    In the -rt kernel (mrg), we hit the following dump:
    
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
    PGD a2d39067 PUD b1641067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
    CPU 3
    Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
    RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
    RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
    RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
    RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
    RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
    R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
    R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
    FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
    Stack:
     ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
     0000000001200011 0000000001200011 0000000000000000 0000000000000000
     00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
    Call Trace:
     [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
     [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
     [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
     [<ffffffff8104369a>] do_fork+0x5a/0x360
     [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
     [<ffffffff8100b068>] sys_clone+0x28/0x30
     [<ffffffff81527423>] stub_clone+0x13/0x20
     [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
    Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
    RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
     RSP <ffff8800a9b17d70>
    CR2: 0000000000000000
    ---[ end trace 0000000000000002 ]---
    
    Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
    with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
    disable migration. But the SLUB code is relatively lockless, and the
    spin_locks there are raw_spin_locks (not converted to mutexes), thus I
    believe this bug can happen in mainline without -rt features. The -rt
    patch is just good at triggering mainline bugs ;-)
    
    Anyway, looking at where this crashed, it seems that the page variable
    can be NULL when passed to the node_match() function (which does not
    check if it is NULL). When this happens we get the above panic.
    
    As page is only used in slab_alloc() to check if the node matches, if
    it's NULL I'm assuming that we can say it doesn't and call the
    __slab_alloc() code. Is this a correct assumption?
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c25f195e
slub.c 126 KB