1. 06 Aug, 2015 40 commits
    • Ben Greear's avatar
      Fix lockup related to stop_machine being stuck in __do_softirq. · 383253cf
      Ben Greear authored
      commit 34376a50 upstream.
      
      The stop machine logic can lock up if all but one of the migration
      threads make it through the disable-irq step and the one remaining
      thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
      that it has a bail-out based on jiffies timeout, but in the lockup case,
      jiffies itself is not incremented.
      
      To work around this, re-add the max_restart counter in __do_irq and stop
      processing irqs after 10 restarts.
      
      Thanks to Tejun Heo and Rusty Russell and others for helping me track
      this down.
      
      This was introduced in 3.9 by commit c10d7367 ("softirq: reduce
      latencies").
      
      It may be worth looking into ath9k to see if it has issues with its irq
      handler at a later date.
      
      The hang stack traces look something like this:
      
          ------------[ cut here ]------------
          WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
          Watchdog detected hard LOCKUP on cpu 2
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
          Call Trace:
           <NMI>   warn_slowpath_common+0x85/0x9f
            warn_slowpath_fmt+0x46/0x48
            watchdog_overflow_callback+0x9c/0xa7
            __perf_event_overflow+0x137/0x1cb
            perf_event_overflow+0x14/0x16
            intel_pmu_handle_irq+0x2dc/0x359
            perf_event_nmi_handler+0x19/0x1b
            nmi_handle+0x7f/0xc2
            do_nmi+0xbc/0x304
            end_repeat_nmi+0x1e/0x2e
           <<EOE>>
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
          ---[ end trace 4947dfa9b0a4cec3 ]---
          BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          irq event stamp: 835637905
          hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
          hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
          softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
          softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
          CPU 1
          Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
          RIP: tasklet_hi_action+0xf0/0xf0
          Process migration/1
          Call Trace:
           <IRQ>
            __do_softirq+0x117/0x257
            irq_exit+0x5f/0xbb
            smp_apic_timer_interrupt+0x8a/0x98
            apic_timer_interrupt+0x72/0x80
           <EOI>
            printk+0x4d/0x4f
            stop_machine_cpu_stop+0x22c/0x274
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
      Signed-off-by: default avatarBen Greear <greearb@candelatech.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPekka Riikonen <priikone@iki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      383253cf
    • Eric Dumazet's avatar
      softirq: reduce latencies · 29a07c1e
      Eric Dumazet authored
      commit c10d7367 upstream.
      
      In various network workloads, __do_softirq() latencies can be up
      to 20 ms if HZ=1000, and 200 ms if HZ=100.
      
      This is because we iterate 10 times in the softirq dispatcher,
      and some actions can consume a lot of cycles.
      
      This patch changes the fallback to ksoftirqd condition to :
      
      - A time limit of 2 ms.
      - need_resched() being set on current task
      
      When one of this condition is met, we wakeup ksoftirqd for further
      softirq processing if we still have pending softirqs.
      
      Using need_resched() as the only condition can trigger RCU stalls,
      as we can keep BH disabled for too long.
      
      I ran several benchmarks and got no significant difference in
      throughput, but a very significant reduction of latencies (one order
      of magnitude) :
      
      In following bench, 200 antagonist "netperf -t TCP_RR" are started in
      background, using all available cpus.
      
      Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
      IRQ (hard+soft)
      
      Before patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=550110.424
      MIN_LATENCY=146858
      MAX_LATENCY=997109
      P50_LATENCY=305000
      P90_LATENCY=550000
      P99_LATENCY=710000
      MEAN_LATENCY=376989.12
      STDDEV_LATENCY=184046.92
      
      After patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=40545.492
      MIN_LATENCY=9834
      MAX_LATENCY=78366
      P50_LATENCY=33583
      P90_LATENCY=59000
      P99_LATENCY=69000
      MEAN_LATENCY=38364.67
      STDDEV_LATENCY=12865.26
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      29a07c1e
    • Dan McGee's avatar
      powerpc+sparc64/mm: Remove hack in mmap randomize layout · f9cedbf0
      Dan McGee authored
      commit fa8cbaaf upstream.
      
      Since commit 8a0a9bd4, this comment in mmap_rnd() does not
      hold true as the value returned by get_random_int() will in fact be
      
      different every single call. Remove the comment and simplify the code
      back to its original desired form.
      
      This reverts commit a5adc91a which is no longer necessary and
      also fixes the sparc code that copied this same adjustment.
      Signed-off-by: default avatarDan McGee <dpmcgee@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Moritz Mühlenhoff <jmm@inutil.org>
      f9cedbf0
    • Mark Grondona's avatar
      __ptrace_may_access() should not deny sub-threads · f062bd6e
      Mark Grondona authored
      commit 73af963f upstream.
      
      __ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
      current, this can can lead to surprising results.
      
      For example, a sub-thread can't readlink("/proc/self/exe") if the
      executable is not readable.  setup_new_exec()->would_dump() notices that
      inode_permission(MAY_READ) fails and then it does
      set_dumpable(suid_dumpable).  After that get_dumpable() fails.
      
      (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
      could add PTRACE_MODE_NODUMPABLE)
      
      Change __ptrace_may_access() to use same_thread_group() instead of "task
      == current".  Any security check is pointless when the tasks share the
      same ->mm.
      Signed-off-by: default avatarMark Grondona <mgrondona@llnl.gov>
      Signed-off-by: default avatarBen Woodard <woodard@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Sheng Yong <shengyong1@huawei.com>
      f062bd6e
    • Oleg Nesterov's avatar
      include/linux/sched.h: don't use task->pid/tgid in same_thread_group/has_group_leader_pid · a7b4d513
      Oleg Nesterov authored
      commit e1403b8e upstream.
      
      task_struct->pid/tgid should go away.
      
      1. Change same_thread_group() to use task->signal for comparison.
      
      2. Change has_group_leader_pid(task) to compare task_pid(task) with
         signal->leader_pid.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Reviewed-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Sheng Yong <shengyong1@huawei.com>
      a7b4d513
    • Feng Tang's avatar
      x86/reboot: Fix a warning message triggered by stop_other_cpus() · ea475029
      Feng Tang authored
      commit 55c844a4 upstream.
      
      When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
      always see this warning msg:
      
      Restarting system.
      machine restart
      ------------[ cut here ]------------
      WARNING: at arch/x86/kernel/smp.c:125
      native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
      Modules linked in: igb [last unloaded: scsi_wait_scan]
      Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
      Call Trace:
       <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
       [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
       [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
       [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
       [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
       [<ffffffff81036768>] update_process_times+0x60/0x70
       [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
       [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
       [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
       [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
       [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
       [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
       <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
       [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
       [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
       [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
       [<ffffffff81018926>] machine_shutdown+0xa/0xc
       [<ffffffff8101897e>] native_machine_restart+0x20/0x32
       [<ffffffff810189b0>] machine_restart+0xa/0xc
       [<ffffffff8103b196>] kernel_restart+0x47/0x4c
       [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
       [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
       [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
       [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
       [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
       [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
      ---[ end trace 320af5cb1cb60c5b ]---
      
      The root cause seems to be the
      default_send_IPI_mask_allbutself_phys() takes quite some time (I
      measured it could be several ms) to complete sending NMIs to all
      the other 23 CPUs, and for HZ=250/1000 system, the time is long
      enough for a timer interrupt to happen, which will in turn
      trigger to kick load balance to a stopped CPU and cause this
      warning in native_smp_send_reschedule().
      
      So disabling the local irq before stop_other_cpu() can fix this
      problem (tested 25 times reboot ok), and it is fine as there
      should be nobody caring the timer interrupt in such reboot
      stage.
      
      The latest 3.4 kernel slightly changes this behavior by sending
      REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
      fails, and this patch is still needed to prevent the problem.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Vinson Lee <vlee@twopensource.com>
      ea475029
    • Jim Snow's avatar
      sb_edac: Fix erroneous bytes->gigabytes conversion · 4e2b0364
      Jim Snow authored
      commit 8c009100 upstream.
      Signed-off-by: default avatarJim Snow <jim.snow@intel.com>
      Signed-off-by: default avatarLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      [lizf: Backported to 3.4:
       - adjust context
       - use debugf0() instead of edac_dbg()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4e2b0364
    • Mauro Carvalho Chehab's avatar
      Fix sb_edac compilation with 32 bits kernels · f4782e32
      Mauro Carvalho Chehab authored
      commit 5b889e37 upstream.
      
      As reported by Josh Boyer <jwboyer@redhat.com>:
      >	drivers/edac/sb_edac.c: In function 'get_memory_error_data':
      > 	drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type
      > 	[enabled by default]
      > 	<snip>
      > 	ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined!
      > 	make[1]: *** [__modpost] Error 1
      > 	make: *** [modules] Error 2
      
      PS.: compile-tested only
      Reported-by: default avatarJosh Boyer <jwboyer@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@redhat.com>
      [bwh: Prerequisite for "sb_edac: Fix erroneous bytes->gigabytes conversion"]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f4782e32
    • Konrad Rzeszutek Wilk's avatar
      config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected · 38e4433d
      Konrad Rzeszutek Wilk authored
      commit a6dfa128 upstream.
      
      A huge amount of NIC drivers use the DMA API, however if
      compiled under 32-bit an very important part of the DMA API can
      be ommitted leading to the drivers not working at all
      (especially if used with 'swiotlb=force iommu=soft').
      
      As Prashant Sreedharan explains it: "the driver [tg3] uses
      DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
      the dma "mapping" and dma_unmap_addr() to get the "mapping"
      value. On most of the platforms this is a no-op, but ... with
      "iommu=soft and swiotlb=force" this house keeping is required,
      ... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
      instead of the DMA address."
      
      As such enable this even when using 32-bit kernels.
      Reported-by: default avatarIan Jackson <Ian.Jackson@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: cascardo@linux.vnet.ibm.com
      Cc: david.vrabel@citrix.com
      Cc: sanjeevb@broadcom.com
      Cc: siva.kallam@broadcom.com
      Cc: vyasevich@gmail.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2: also change the def_bool (cond) to
       def_bool y + depends]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      38e4433d
    • Junling Zheng's avatar
      net: socket: Fix the wrong returns for recvmsg and sendmsg · f09662ea
      Junling Zheng authored
      Based on 08adb7da upstream.
      
      We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
      was expected.
      
      We tested it through the recvmsg01 testcase come from LTP testsuit. It set
      msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
      unexpected (errno 22 is expected):
      
      recvmsg01    4  TFAIL  :  invalid socket length ; returned -1 (expected -1),
      errno 14 (expected 22)
      
      Linux mainline has no this bug for commit 08adb7da fixes it accidentally.
      However, it is too large and complex to be backported to LTS 3.10.
      
      Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
      copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
      error if msg_sys->msg_namelen was negative, which changed the behaviors
      of recvmsg and sendmsg syscall in a lib32 system:
      
      Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
      return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
      was invalid and then syscall returned -EINVAL, which is correct.
      
      And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
      fail and wants to return -EINVAL, however, the outer syscall will return
      -EFAULT directly, which is unexpected.
      
      This patch gets the return value of get_compat_msghdr() as well as
      copy_msghdr_from_user(), then returns this expected value if
      get_compat_msghdr() fails.
      
      Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
      Signed-off-by: default avatarJunling Zheng <zhengjunling@huawei.com>
      Signed-off-by: default avatarHanbing Xu <xuhanbing@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f09662ea
    • Joonsoo Kim's avatar
      slub: refactoring unfreeze_partials() · 3ecadd2f
      Joonsoo Kim authored
      commit 43d77867 upstream.
      
      Current implementation of unfreeze_partials() is so complicated,
      but benefit from it is insignificant. In addition many code in
      do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
      Under current implementation which test status of cpu partial slab
      and acquire list_lock in do {} while loop,
      we don't need to acquire a list_lock and gain a little benefit
      when front of the cpu partial slab is to be discarded, but this is a rare case.
      In case that add_partial is performed and cmpxchg_double_slab is failed,
      remove_partial should be called case by case.
      
      I think that these are disadvantages of current implementation,
      so I do refactoring unfreeze_partials().
      
      Minimizing code in do {} while loop introduce a reduced fail rate
      of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
      when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
      
      ** before **
      Cmpxchg_double Looping
      ------------------------
      Locked Cmpxchg Double redos   182685
      Unlocked Cmpxchg Double redos 0
      
      ** after **
      Cmpxchg_double Looping
      ------------------------
      Locked Cmpxchg Double redos   177995
      Unlocked Cmpxchg Double redos 1
      
      We can see cmpxchg_double_slab fail rate is improved slightly.
      
      Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.
      
      ** before **
       Performance counter stats for './hackbench 50 process 4000' (30 runs):
      
           108517.190463 task-clock                #    7.926 CPUs utilized            ( +-  0.24% )
               2,919,550 context-switches          #    0.027 M/sec                    ( +-  3.07% )
                 100,774 CPU-migrations            #    0.929 K/sec                    ( +-  4.72% )
                 124,201 page-faults               #    0.001 M/sec                    ( +-  0.15% )
         401,500,234,387 cycles                    #    3.700 GHz                      ( +-  0.24% )
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
         250,576,913,354 instructions              #    0.62  insns per cycle          ( +-  0.13% )
          45,934,956,860 branches                  #  423.297 M/sec                    ( +-  0.14% )
             188,219,787 branch-misses             #    0.41% of all branches          ( +-  0.56% )
      
            13.691837307 seconds time elapsed                                          ( +-  0.24% )
      
      ** after **
       Performance counter stats for './hackbench 50 process 4000' (30 runs):
      
           107784.479767 task-clock                #    7.928 CPUs utilized            ( +-  0.22% )
               2,834,781 context-switches          #    0.026 M/sec                    ( +-  2.33% )
                  93,083 CPU-migrations            #    0.864 K/sec                    ( +-  3.45% )
                 123,967 page-faults               #    0.001 M/sec                    ( +-  0.15% )
         398,781,421,836 cycles                    #    3.700 GHz                      ( +-  0.22% )
         <not supported> stalled-cycles-frontend
         <not supported> stalled-cycles-backend
         250,189,160,419 instructions              #    0.63  insns per cycle          ( +-  0.09% )
          45,855,370,128 branches                  #  425.436 M/sec                    ( +-  0.10% )
             169,881,248 branch-misses             #    0.37% of all branches          ( +-  0.43% )
      
            13.596272341 seconds time elapsed                                          ( +-  0.22% )
      
      No regression is found, but rather we can see slightly better result.
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Zefan Li <lizefan@huawei.com>
      3ecadd2f
    • Ben Hutchings's avatar
      debugfs: Fix statfs() regression in 3.2.69 · 20a5d5d4
      Ben Hutchings authored
      Commit 915f4f86 ("debugfs: leave freeing a symlink body until inode
      eviction", commit 0db59e59 upstream) changed debugfs to define its
      own super_operations and implement the evict_inode operation.
      
      Luis Henriques pointed out that it needs to define the statfs
      operation, as in simple_super_operations which it was using before.
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      20a5d5d4
    • Alexander Sverdlin's avatar
      sctp: Fix race between OOTB responce and route removal · 117b8a10
      Alexander Sverdlin authored
      [ Upstream commit 29c4afc4 ]
      
      There is NULL pointer dereference possible during statistics update if the route
      used for OOTB responce is removed at unfortunate time. If the route exists when
      we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
      ABORT, but in the meantime route is removed under our feet, we take "no_route"
      path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
      
      But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
      sctp_transport_set_owner() and therefore there is no asoc associated with this
      packet. Probably temporary asoc just for OOTB responces is overkill, so just
      introduce a check like in all other places in sctp_packet_transmit(), where
      "asoc" is dereferenced.
      
      To reproduce this, one needs to
      0. ensure that sctp module is loaded (otherwise ABORT is not generated)
      1. remove default route on the machine
      2. while true; do
           ip route del [interface-specific route]
           ip route add [interface-specific route]
         done
      3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
         responce
      
      On x86_64 the crash looks like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: ...
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
      Hardware name: ...
      task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
      RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
      RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
      RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
      R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
      R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
      FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
      Stack:
       ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
       ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
       0000000000000000 0000000000000001 0000000000000000 0000000000000000
      Call Trace:
       <IRQ>
       [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
       [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
       [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
       [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
       [<ffffffff810e0329>] ? update_process_times+0x59/0x60
       [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
       [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
       [<ffffffff8101f599>] ? read_tsc+0x9/0x10
       [<ffffffff8116d4b5>] ? put_page+0x55/0x60
       [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
       [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
       [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
       [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
       [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
       [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
       [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
       [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
       [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
       [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
       [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
       [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
       [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
       [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
       [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
       [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
       [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
       [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
       [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
       [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
       [<ffffffff8147896a>] net_rx_action+0x21a/0x360
       [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
       [<ffffffff8107912d>] irq_exit+0xad/0xb0
       [<ffffffff8157d158>] do_IRQ+0x58/0xf0
       [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
       <EOI>
       [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
       [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
       [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
       [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
       [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
       [<ffffffff8156b365>] rest_init+0x85/0x90
       [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
       [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
       [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
      Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
      RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
       RSP <ffff880127c037b8>
      CR2: 0000000000000020
      ---[ end trace 5aec7fd2dc983574 ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      drm_kms_helper: panic occurred, switching back to text console
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: sctp alway uses init_net]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      117b8a10
    • Julian Anastasov's avatar
      neigh: do not modify unlinked entries · 818ffedf
      Julian Anastasov authored
      [ Upstream commit 2c51a97f ]
      
      The lockless lookups can return entry that is unlinked.
      Sometimes they get reference before last neigh_cleanup_and_release,
      sometimes they do not need reference. Later, any
      modification attempts may result in the following problems:
      
      1. entry is not destroyed immediately because neigh_update
      can start the timer for dead entry, eg. on change to NUD_REACHABLE
      state. As result, entry lives for some time but is invisible
      and out of control.
      
      2. __neigh_event_send can run in parallel with neigh_destroy
      while refcnt=0 but if timer is started and expired refcnt can
      reach 0 for second time leading to second neigh_destroy and
      possible crash.
      
      Thanks to Eric Dumazet and Ying Xue for their work and analyze
      on the __neigh_event_send change.
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Fixes: a263b309 ("ipv4: Make neigh lookups directly in output packet path.")
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: drop change to __neigh_set_probe_once() which
       we don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      818ffedf
    • Willem de Bruijn's avatar
      packet: avoid out of bounds read in round robin fanout · d77456d7
      Willem de Bruijn authored
      [ Upstream commit 468479e6 ]
      
      PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
      f->num_members. It returns the old value unconditionally, but
      f->num_members may have changed since the last store. Ensure
      that the return value is always < num.
      
      When modifying the logic, simplify it further by replacing the loop
      with an unconditional atomic increment.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Adjust context
       - Demux functions return a sock pointer, not an index]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d77456d7
    • Eric Dumazet's avatar
      packet: read num_members once in packet_rcv_fanout() · f5a27902
      Eric Dumazet authored
      [ Upstream commit f98f4514 ]
      
      We need to tell compiler it must not read f->num_members multiple
      times. Otherwise testing if num is not zero is flaky, and we could
      attempt an invalid divide by 0 in fanout_demux_cpu()
      
      Note bug was present in packet_rcv_fanout_hash() and
      packet_rcv_fanout_lb() but final 3.1 had a simple location
      after commit 95ec3eb4 ("packet: Add 'cpu' fanout policy.")
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f5a27902
    • Nikolay Aleksandrov's avatar
      bridge: fix br_stp_set_bridge_priority race conditions · 41431e40
      Nikolay Aleksandrov authored
      [ Upstream commit 2dab80a8 ]
      
      After the ->set() spinlocks were removed br_stp_set_bridge_priority
      was left running without any protection when used via sysfs. It can
      race with port add/del and could result in use-after-free cases and
      corrupted lists. Tested by running port add/del in a loop with stp
      enabled while setting priority in a loop, crashes are easily
      reproducible.
      The spinlocks around sysfs ->set() were removed in commit:
      14f98f25 ("bridge: range check STP parameters")
      There's also a race condition in the netlink priority support that is
      fixed by this change, but it was introduced recently and the fixes tag
      covers it, just in case it's needed the commit is:
      af615762 ("bridge: add ageing_time, stp_state, priority over netlink")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 14f98f25 ("bridge: range check STP parameters")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      41431e40
    • Ian Campbell's avatar
      xen: netback: read hotplug script once at start of day. · f563f5e0
      Ian Campbell authored
      [ Upstream commit 31a41898 ]
      
      When we come to tear things down in netback_remove() and generate the
      uevent it is possible that the xenstore directory has already been
      removed (details below).
      
      In such cases netback_uevent() won't be able to read the hotplug
      script and will write a xenstore error node.
      
      A recent change to the hypervisor exposed this race such that we now
      sometimes lose it (where apparently we didn't ever before).
      
      Instead read the hotplug script configuration during setup and use it
      for the lifetime of the backend device.
      
      The apparently more obvious fix of moving the transition to
      state=Closed in netback_remove() to after the uevent does not work
      because it is possible that we are already in state=Closed (in
      reaction to the guest having disconnected as it shutdown). Being
      already in Closed means the toolstack is at liberty to start tearing
      down the xenstore directories. In principal it might be possible to
      arrange to unregister the device sooner (e.g on transition to Closing)
      such that xenstore would still be there but this state machine is
      fragile and prone to anger...
      
      A modern Xen system only relies on the hotplug uevent for driver
      domains, when the backend is in the same domain as the toolstack it
      will run the necessary setup/teardown directly in the correct sequence
      wrt xenstore changes.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f563f5e0
    • Mark Salyzyn's avatar
      unix/caif: sk_socket can disappear when state is unlocked · f25e48cb
      Mark Salyzyn authored
      [ Upstream commit b48732e4 ]
      
      got a rare NULL pointer dereference in clear_bit
      Signed-off-by: default avatarMark Salyzyn <salyzyn@android.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      ----
      v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
      v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f25e48cb
    • Richard Cochran's avatar
      net: dp83640: fix broken calibration routine. · dbd061fc
      Richard Cochran authored
      [ Upstream commit 397a253a ]
      
      Currently, the calibration function that corrects the initial offsets
      among multiple devices only works the first time.  If the function is
      called more than once, the calibration fails and bogus offsets will be
      programmed into the devices.
      
      In a well hidden spot, the device documentation tells that trigger indexes
      0 and 1 are special in allowing the TRIG_IF_LATE flag to actually work.
      
      This patch fixes the issue by using one of the special triggers during the
      recalibration method.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dbd061fc
    • Scott Wood's avatar
      powerpc: Don't skip ePAPR spin-table CPUs · dfbbd2ee
      Scott Wood authored
      commit 6663a4fa upstream.
      
      Commit 59a53afe "powerpc: Don't setup
      CPUs with bad status" broke ePAPR SMP booting.  ePAPR says that CPUs
      that aren't presently running shall have status of disabled, with
      enable-method being used to determine whether the CPU can be enabled.
      
      Fix by checking for spin-table, which is currently the only supported
      enable-method.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Emil Medve <Emilian.Medve@Freescale.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Jonathan Toppins <jtoppins@cumulusnetworks.com>
      dfbbd2ee
    • Anton Blanchard's avatar
      powerpc: Make logical to real cpu mapping code endian safe · 86e059f8
      Anton Blanchard authored
      commit ac13282d upstream.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [jt: fixed up context due to commit 59a53afe
           already being backported.]
      Signed-off-by: default avatarJonathan Toppins <jtoppins@cumulusnetworks.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Acked-by: default avatarCurt Brune <curt@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      86e059f8
    • Thierry Reding's avatar
      dt: Add empty of_property_match_string() function · e658cbfd
      Thierry Reding authored
      commit bd3d5500 upstream.
      
      This commit adds an empty of_property_match_string() function for
      !CONFIG_OF builds.
      Acked-by: default avatarRob Herring <rob.herring@calxeda.com>
      Signed-off-by: default avatarThierry Reding <thierry.reding@avionic-design.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Jonathan Toppins <jtoppins@cumulusnetworks.com>
      e658cbfd
    • Grant Likely's avatar
      of: Add of_property_match_string() to find index into a string list · 64e195f6
      Grant Likely authored
      commit 7aff0fe3 upstream.
      
      Add a helper function for finding the index of a string in a string
      list property.  This helper is useful for bindings that use a separate
      *-name property for attaching names to tuples in another property such
      as 'reg' or 'gpios'.
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      [jt: dropped changes to files not existing in the 3.2 tree]
      Signed-off-by: default avatarJonathan Toppins <jtoppins@cumulusnetworks.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Acked-by: default avatarCurt Brune <curt@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      64e195f6
    • Hans Schillstrom's avatar
      ipvs: kernel oops - do_ip_vs_get_ctl · 29f163de
      Hans Schillstrom authored
      commit 8537de8a upstream.
      
      Change order of init so netns init is ready
      when register ioctl and netlink.
      
      Ver2
      	Whitespace fixes and __init added.
      Reported-by: default avatar"Ryan O'Hara" <rohara@redhat.com>
      Signed-off-by: default avatarHans Schillstrom <hans.schillstrom@ericsson.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Andreas Unterkircher <unki@netshadow.at>
      29f163de
    • Marcelo Ricardo Leitner's avatar
      sctp: fix ASCONF list handling · 001b7cc9
      Marcelo Ricardo Leitner authored
      commit 2d45a02d upstream.
      
      ->auto_asconf_splist is per namespace and mangled by functions like
      sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
      
      Also, the call to inet_sk_copy_descendant() was backuping
      ->auto_asconf_list through the copy but was not honoring
      ->do_auto_asconf, which could lead to list corruption if it was
      different between both sockets.
      
      This commit thus fixes the list handling by using ->addr_wq_lock
      spinlock to protect the list. A special handling is done upon socket
      creation and destruction for that. Error handlig on sctp_init_sock()
      will never return an error after having initialized asconf, so
      sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
      will be take on sctp_close_sock(), before locking the socket, so we
      don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
      
      Instead of taking the lock on sctp_sock_migrate() for copying and
      restoring the list values, it's preferred to avoid rewritting it by
      implementing sctp_copy_descendant().
      
      Issue was found with a test application that kept flipping sysctl
      default_auto_asconf on and off, but one could trigger it by issuing
      simultaneous setsockopt() calls on multiple sockets or by
      creating/destroying sockets fast enough. This is only triggerable
      locally.
      
      Fixes: 9f7d653b ("sctp: Add Auto-ASCONF support (core).")
      Reported-by: default avatarJi Jianwen <jiji@redhat.com>
      Suggested-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Adjust filename, context
       - Most per-netns state is global]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      001b7cc9
    • Eric Dumazet's avatar
      udp: fix behavior of wrong checksums · 556574d9
      Eric Dumazet authored
      commit beb39db5 upstream.
      
      We have two problems in UDP stack related to bogus checksums :
      
      1) We return -EAGAIN to application even if receive queue is not empty.
         This breaks applications using edge trigger epoll()
      
      2) Under UDP flood, we can loop forever without yielding to other
         processes, potentially hanging the host, especially on non SMP.
      
      This patch is an attempt to make things better.
      
      We might in the future add extra support for rt applications
      wanting to better control time spent doing a recv() in a hostile
      environment. For example we could validate checksums before queuing
      packets in socket receive queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      556574d9
    • Ben Hutchings's avatar
      pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic · 75cf667b
      Ben Hutchings authored
      pipe_iov_copy_{from,to}_user() may be tried twice with the same iovec,
      the first time atomically and the second time not.  The second attempt
      needs to continue from the iovec position, pipe buffer offset and
      remaining length where the first attempt failed, but currently the
      pipe buffer offset and remaining length are reset.  This will corrupt
      the piped data (possibly also leading to an information leak between
      processes) and may also corrupt kernel memory.
      
      This was fixed upstream by commits f0d1bec9 ("new helper:
      copy_page_from_iter()") and 637b58c2 ("switch pipe_read() to
      copy_page_to_iter()"), but those aren't suitable for stable.  This fix
      for older kernel versions was made by Seth Jennings for RHEL and I
      have extracted it from their update.
      
      CVE-2015-1805
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1202855Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      75cf667b
    • Steven Rostedt's avatar
      tracing: Have filter check for balanced ops · 9fa3f3e6
      Steven Rostedt authored
      commit 2cf30dc1 upstream.
      
      When the following filter is used it causes a warning to trigger:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: No error
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
       Modules linked in: bnep lockd grace bluetooth  ...
       CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
        0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
        ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
       Call Trace:
        [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
        [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
        [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
        [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81159065>] replace_preds+0x3c5/0x990
        [<ffffffff811596b2>] create_filter+0x82/0xb0
        [<ffffffff81159944>] apply_event_filter+0xd4/0x180
        [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
        [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
        [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
        [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
        [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
        [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
        [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
       ---[ end trace e11028bd95818dcd ]---
      
      Worse yet, reading the error message (the filter again) it says that
      there was no error, when there clearly was. The issue is that the
      code that checks the input does not check for balanced ops. That is,
      having an op between a closed parenthesis and the next token.
      
      This would only cause a warning, and fail out before doing any real
      harm, but it should still not caues a warning, and the error reported
      should work:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: Meaningless filter expression
      
      And give no kernel warning.
      
      Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [bwh: Backported to 3.2: drop the check for OP_NOT, which we don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9fa3f3e6
    • Wang Long's avatar
      ring-buffer-benchmark: Fix the wrong sched_priority of producer · 67766970
      Wang Long authored
      commit 10802932 upstream.
      
      The producer should be used producer_fifo as its sched_priority,
      so correct it.
      
      Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.comSigned-off-by: default avatarWang Long <long.wanglong@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      67766970
    • Nikolay Aleksandrov's avatar
      bridge: fix multicast router rlist endless loop · a01afa11
      Nikolay Aleksandrov authored
      commit 1a040eac upstream.
      
      Since the addition of sysfs multicast router support if one set
      multicast_router to "2" more than once, then the port would be added to
      the hlist every time and could end up linking to itself and thus causing an
      endless loop for rlist walkers.
      So to reproduce just do:
      echo 2 > multicast_router; echo 2 > multicast_router;
      in a bridge port and let some igmp traffic flow, for me it hangs up
      in br_multicast_flood().
      Fix this by adding a check in br_multicast_add_router() if the port is
      already linked.
      The reason this didn't happen before the addition of multicast_router
      sysfs entries is because there's a !hlist_unhashed check that prevents
      it.
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 0909e117 ("bridge: Add multicast_router sysfs entries")
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a01afa11
    • James Hogan's avatar
      MIPS: Fix enabling of DEBUG_STACKOVERFLOW · b9cc0994
      James Hogan authored
      commit 5f35b9cd upstream.
      
      Commit 334c86c4 ("MIPS: IRQ: Add stackoverflow detection") added
      kernel stack overflow detection, however it only enabled it conditional
      upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never
      actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW,
      which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so
      switch it to using that definition instead.
      
      Fixes: 334c86c4 ("MIPS: IRQ: Add stackoverflow detection")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Adam Jiang <jiang.adam@gmail.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: http://patchwork.linux-mips.org/patch/10531/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b9cc0994
    • 洪一竹's avatar
      Input: elantech - add new icbody type · 309f1244
      洪一竹 authored
      commit 692dd191 upstream.
      
      This adds new icbody type to the list recognized by Elantech PS/2 driver.
      Signed-off-by: default avatarSam Hung <sam.hung@emc.com.tw>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      309f1244
    • Sam hung's avatar
      Input: elantech - support new ICs types for version 4 · fc499f97
      Sam hung authored
      commit 810aa091 upstream.
      
      This change allows the driver to recognize newer Elantech touchpads.
      Signed-off-by: default avatarYi ju Hong <sam.hung@emc.com.tw>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fc499f97
    • Jordan Rife's avatar
      Input: elantech - add support for newer elantech touchpads · 5a0df908
      Jordan Rife authored
      commit ae4bedf0 upstream.
      
      Newer elantech touchpads are not recognized by the current driver, since it
      fails to detect their firmware version number. This prevents more advanced
      touchpad features from being usable such as two-finger scrolling. This
      patch allows newer touchpads to be detected and be fully functional. Tested
      on Sony Vaio SVF13N17PXB.
      Signed-off-by: default avatarJordan Rife <jrife0@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5a0df908
    • Matt Walker's avatar
      Input: elantech - add support for newer (August 2013) devices · 4881050e
      Matt Walker authored
      commit 9cb80b96 upstream.
      
      Added detection for newer Elantech touchpads, so that kernel doesn't
      fall-back to default PS/2 driver. Supports touchpads released after
      ~August 2013.  Fixes bug:
      https://lists.launchpad.net/kernel-packages/msg18481.html
      
      Tested on an Acer Aspire S7-392-6302.
      
      Signed-off by: Matt Walker <matt.g.d.walker@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4881050e
    • Matteo Delfino's avatar
      Input: elantech - fix for newer hardware versions (v7) · f605bc36
      Matteo Delfino authored
      commit 9eebed7d upstream.
      
      * Fix version recognition in elantech_set_properties
      
        The new hardware reports itself as v7 but the packets'
        structure is unaltered.
      
      * Fix packet type recognition in elantech_packet_check_v4
      
        The bitmask used for v6 is too wide, only the last three bits of
        the third byte in a packet (packet[3] & 0x03) are actually used to
        distinguish between packet types.
        Starting from v7, additional information (to be interpreted) is
        stored in the remaining bits (packets[3] & 0x1c).
        In addition, the value stored in (packet[0] & 0x0c) is no longer
        a constant but contains additional information yet to be deciphered.
        This change should be backwards compatible with v6 hardware.
      
      Additional-author: Giovanni Frigione <gio.frigione@gmail.com>
      Signed-off-by: default avatarMatteo Delfino <kendatsuba@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f605bc36
    • John D. Blair's avatar
      USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle · db54381f
      John D. Blair authored
      commit df72d588 upstream.
      
      Added the USB serial device ID for the HubZ dual ZigBee
      and Z-Wave radio dongle.
      Signed-off-by: default avatarJohn D. Blair <johnb@candicontrols.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db54381f
    • Clemens Ladisch's avatar
      ALSA: usb-audio: fix missing input volume controls in MAYA44 USB(+) · 7751e0e8
      Clemens Ladisch authored
      commit ea114fc2 upstream.
      
      The driver worked around an error in the MAYA44 USB(+)'s mixer unit
      descriptor by aborting before parsing the missing field.  However,
      aborting parsing too early prevented parsing of the other units
      connected to this unit, so the capture mixer controls would be missing.
      
      Fix this by moving the check for this descriptor error after the parsing
      of the unit's input pins.
      Reported-by: default avatarnightmixes <nightmixes@gmail.com>
      Tested-by: default avatarnightmixes <nightmixes@gmail.com>
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 3.2:
       - Adjust context
       - Logging statement was different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7751e0e8
    • Clemens Ladisch's avatar
      ALSA: usb-audio: add MAYA44 USB+ mixer control names · a2066716
      Clemens Ladisch authored
      commit 044bddb9 upstream.
      
      Add mixer control names for the ESI Maya44 USB+ (which appears to be
      identical width the AudioTrak Maya44 USB).
      Reported-by: default avatarnightmixes <nightmixes@gmail.com>
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a2066716