• Herton R. Krzesinski's avatar
    ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits · a1c4fb80
    Herton R. Krzesinski authored
    commit 602b8593 upstream.
    
    The current semaphore code allows a potential use after free: in
    exit_sem we may free the task's sem_undo_list while there is still
    another task looping through the same semaphore set and cleaning the
    sem_undo list at freeary function (the task called IPC_RMID for the same
    semaphore set).
    
    For example, with a test program [1] running which keeps forking a lot
    of processes (which then do a semop call with SEM_UNDO flag), and with
    the parent right after removing the semaphore set with IPC_RMID, and a
    kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
    CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
    in the kernel log:
    
       Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
       000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
       010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
       Prev obj: start=ffff88003b45c180, len=64
       000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
       010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
       Next obj: start=ffff88003b45c200, len=64
       000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
       010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
       BUG: spinlock wrong CPU on CPU#2, test/18028
       general protection fault: 0000 [#1] SMP
       Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
       CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
       RIP: spin_dump+0x53/0xc0
       Call Trace:
         spin_bug+0x30/0x40
         do_raw_spin_unlock+0x71/0xa0
         _raw_spin_unlock+0xe/0x10
         freeary+0x82/0x2a0
         ? _raw_spin_lock+0xe/0x10
         semctl_down.clone.0+0xce/0x160
         ? __do_page_fault+0x19a/0x430
         ? __audit_syscall_entry+0xa8/0x100
         SyS_semctl+0x236/0x2c0
         ? syscall_trace_leave+0xde/0x130
         entry_SYSCALL_64_fastpath+0x12/0x71
       Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
       RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
        RSP <ffff88003750fd68>
       ---[ end trace 783ebb76612867a0 ]---
       NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
       Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
       CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
       RIP: native_read_tsc+0x0/0x20
       Call Trace:
         ? delay_tsc+0x40/0x70
         __delay+0xf/0x20
         do_raw_spin_lock+0x96/0x140
         _raw_spin_lock+0xe/0x10
         sem_lock_and_putref+0x11/0x70
         SYSC_semtimedop+0x7bf/0x960
         ? handle_mm_fault+0xbf6/0x1880
         ? dequeue_task_fair+0x79/0x4a0
         ? __do_page_fault+0x19a/0x430
         ? kfree_debugcheck+0x16/0x40
         ? __do_page_fault+0x19a/0x430
         ? __audit_syscall_entry+0xa8/0x100
         ? do_audit_syscall_entry+0x66/0x70
         ? syscall_trace_enter_phase1+0x139/0x160
         SyS_semtimedop+0xe/0x10
         SyS_semop+0x10/0x20
         entry_SYSCALL_64_fastpath+0x12/0x71
       Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
       Kernel panic - not syncing: softlockup: hung tasks
    
    I wasn't able to trigger any badness on a recent kernel without the
    proper config debugs enabled, however I have softlockup reports on some
    kernel versions, in the semaphore code, which are similar as above (the
    scenario is seen on some servers running IBM DB2 which uses semaphore
    syscalls).
    
    The patch here fixes the race against freeary, by acquiring or waiting
    on the sem_undo_list lock as necessary (exit_sem can race with freeary,
    while freeary sets un->semid to -1 and removes the same sem_undo from
    list_proc or when it removes the last sem_undo).
    
    After the patch I'm unable to reproduce the problem using the test case
    [1].
    
    [1] Test case used below:
    
        #include <stdio.h>
        #include <sys/types.h>
        #include <sys/ipc.h>
        #include <sys/sem.h>
        #include <sys/wait.h>
        #include <stdlib.h>
        #include <time.h>
        #include <unistd.h>
        #include <errno.h>
    
        #define NSEM 1
        #define NSET 5
    
        int sid[NSET];
    
        void thread()
        {
                struct sembuf op;
                int s;
                uid_t pid = getuid();
    
                s = rand() % NSET;
                op.sem_num = pid % NSEM;
                op.sem_op = 1;
                op.sem_flg = SEM_UNDO;
    
                semop(sid[s], &op, 1);
                exit(EXIT_SUCCESS);
        }
    
        void create_set()
        {
                int i, j;
                pid_t p;
                union {
                        int val;
                        struct semid_ds *buf;
                        unsigned short int *array;
                        struct seminfo *__buf;
                } un;
    
                /* Create and initialize semaphore set */
                for (i = 0; i < NSET; i++) {
                        sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                        if (sid[i] < 0) {
                                perror("semget");
                                exit(EXIT_FAILURE);
                        }
                }
                un.val = 0;
                for (i = 0; i < NSET; i++) {
                        for (j = 0; j < NSEM; j++) {
                                if (semctl(sid[i], j, SETVAL, un) < 0)
                                        perror("semctl");
                        }
                }
    
                /* Launch threads that operate on semaphore set */
                for (i = 0; i < NSEM * NSET * NSET; i++) {
                        p = fork();
                        if (p < 0)
                                perror("fork");
                        if (p == 0)
                                thread();
                }
    
                /* Free semaphore set */
                for (i = 0; i < NSET; i++) {
                        if (semctl(sid[i], NSEM, IPC_RMID))
                                perror("IPC_RMID");
                }
    
                /* Wait for forked processes to exit */
                while (wait(NULL)) {
                        if (errno == ECHILD)
                                break;
                };
        }
    
        int main(int argc, char **argv)
        {
                pid_t p;
    
                srand(time(NULL));
    
                while (1) {
                        p = fork();
                        if (p < 0) {
                                perror("fork");
                                exit(EXIT_FAILURE);
                        }
                        if (p == 0) {
                                create_set();
                                goto end;
                        }
    
                        /* Wait for forked processes to exit */
                        while (wait(NULL)) {
                                if (errno == ECHILD)
                                        break;
                        };
                }
        end:
                return 0;
        }
    
    [akpm@linux-foundation.org: use normal comment layout]
    Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
    Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Rafael Aquini <aquini@redhat.com>
    CC: Aristeu Rozanski <aris@redhat.com>
    Cc: David Jeffery <djeffery@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    [bwh: Backported to 3.2: adjust context]
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    a1c4fb80
sem.c 42.5 KB