• Andrew Morton's avatar
    [PATCH] use RCU for IPC locking · bb468c02
    Andrew Morton authored
    Patch from Mingming, Rusty, Hugh, Dipankar, me:
    
    - It greatly reduces the lock contention by having one lock per id.
      The global spinlock is removed and a spinlock is added in
      kern_ipc_perm structure.
    
    - Uses ReadCopyUpdate in grow_ary() for locking-free resizing.
    
    - In the places where ipc_rmid() is called, delay calling ipc_free()
      to RCU callbacks.  This is to prevent ipc_lock() returning an invalid
      pointer after ipc_rmid().  In addition, use the workqueue to enable
      RCU freeing vmalloced entries.
    
    Also some other changes:
    
    - Remove redundant ipc_lockall/ipc_unlockall
    
    - Now ipc_unlock() directly takes IPC ID pointer as argument, avoid
      extra looking up the array.
    
    The changes are made based on the input from Huge Dickens, Manfred
    Spraul and Dipankar Sarma.  In addition, Cliff White has run OSDL's
    dbt1 test on a 2 way against the earlier version of this patch.
    Results shows about 2-6% improvement on the average number of
    transactions per second.  Here is the summary of his tests:
    
                            2.5.42-mm2      2.5.42-mm2-ipclock
    			-----------------------------
    Average over 5 runs     85.0 BT         89.8 BT
    Std Deviation 5 runs     7.4  BT         1.0 BT
    
    Average over 4 best     88.15 BT        90.2 BT
    Std Deviation 4 best     2.8 BT          0.5 BT
    
    
    Also, another test today from Bill Hartner:
    
    I tested Mingming's RCU ipc lock patch using a *new* microbenchmark - semopbench.
    semopbench was written to test the performance of Mingming's patch.
    I also ran a 3 hour stress and it completed successfully.
    
    Explanation of the microbenchmark is below the results.
    Here is a link to the microbenchmark source.
    
    http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/semopbench.c
    
    SUT : 8-way 700 Mhz PIII
    
    I tested 2.5.44-mm2 and 2.5.44-mm2 + RCU ipc patch
    
    >semopbench -g 64 -s 16 -n 16384 -r > sem.results.out
    >readprofile -m /boot/System.map | sort -n +0 -r > sem.profile.out
    
    The metric is seconds / per repetition.  Lower is better.
    
    kernel              run 1     run 2
                        seconds   seconds
    ==================  =======   =======
    2.5.44-mm2          515.1       515.4
    2.5.44-mm2+rcu-ipc   46.7        46.7
    
    With Mingming's patch, the test completes 10X faster.
    bb468c02
util.h 3.1 KB