1. 19 Apr, 2013 4 commits
    • Waiman Long's avatar
      mutex: Back out architecture specific check for negative mutex count · cc189d25
      Waiman Long authored
      Linus suggested that probably all the supported architectures can
      allow a negative mutex count without incorrect behavior, so we can
      then back out the architecture specific change and allow the
      mutex count to go to any negative number. That should further
      reduce contention for non-x86 architecture.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cc189d25
    • Waiman Long's avatar
      mutex: Queue mutex spinners with MCS lock to reduce cacheline contention · 2bd2c92c
      Waiman Long authored
      The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
      turned on) allow multiple tasks to spin on a single mutex
      concurrently. A potential problem with the current approach is
      that when the mutex becomes available, all the spinning tasks
      will try to acquire the mutex more or less simultaneously. As a
      result, there will be a lot of cacheline bouncing especially on
      systems with a large number of CPUs.
      
      This patch tries to reduce this kind of contention by putting
      the mutex spinners into a queue so that only the first one in
      the queue will try to acquire the mutex. This will reduce
      contention and allow all the tasks to move forward faster.
      
      The queuing of mutex spinners is done using an MCS lock based
      implementation which will further reduce contention on the mutex
      cacheline than a similar ticket spinlock based implementation.
      This patch will add a new field into the mutex data structure
      for holding the MCS lock. This expands the mutex size by 8 bytes
      for 64-bit system and 4 bytes for 32-bit system. This overhead
      will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.
      
      The following table shows the jobs per minute (JPM) scalability
      data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
      numactl command is used to restrict the running of the fserver
      workloads to 1/2/4/8 nodes with hyperthreading off.
      
      +-----------------+-----------+-----------+-------------+----------+
      |  Configuration  | Mean JPM  | Mean JPM  |  Mean JPM   | % Change |
      |                 | w/o patch | patch 1   | patches 1&2 |  1->1&2  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 1100 - 2000            |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  227972   |  227237   |   305043    |  +34.2%  |
      | 4 nodes, HT off |  393503   |  381558   |   394650    |   +3.4%  |
      | 2 nodes, HT off |  334957   |  325240   |   338853    |   +4.2%  |
      | 1 node , HT off |  198141   |  197972   |   198075    |   +0.1%  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 200 - 1000             |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  282325   |  312870   |   332185    |   +6.2%  |
      | 4 nodes, HT off |  390698   |  378279   |   393419    |   +4.0%  |
      | 2 nodes, HT off |  336986   |  326543   |   340260    |   +4.2%  |
      | 1 node , HT off |  197588   |  197622   |   197582    |    0.0%  |
      +-----------------+-----------+-----------+-------------+----------+
      
      At low user range 10-100, the JPM differences were within +/-1%.
      So they are not that interesting.
      
      The fserver workload uses mutex spinning extensively. With just
      the mutex change in the first patch, there is no noticeable
      change in performance.  Rather, there is a slight drop in
      performance. This mutex spinning patch more than recovers the
      lost performance and show a significant increase of +30% at high
      user load with the full 8 nodes. Similar improvements were also
      seen in a 3.8 kernel.
      
      The table below shows the %time spent by different kernel
      functions as reported by perf when running the fserver workload
      at 1500 users with all 8 nodes.
      
      +-----------------------+-----------+---------+-------------+
      |        Function       |  % time   | % time  |   % time    |
      |                       | w/o patch | patch 1 | patches 1&2 |
      +-----------------------+-----------+---------+-------------+
      | __read_lock_failed    |  34.96%   | 34.91%  |   29.14%    |
      | __write_lock_failed   |  10.14%   | 10.68%  |    7.51%    |
      | mutex_spin_on_owner   |   3.62%   |  3.42%  |    2.33%    |
      | mspin_lock            |    N/A    |   N/A   |    9.90%    |
      | __mutex_lock_slowpath |   1.46%   |  0.81%  |    0.14%    |
      | _raw_spin_lock        |   2.25%   |  2.50%  |    1.10%    |
      +-----------------------+-----------+---------+-------------+
      
      The fserver workload for an 8-node system is dominated by the
      contention in the read/write lock. Mutex contention also plays a
      role. With the first patch only, mutex contention is down (as
      shown by the __mutex_lock_slowpath figure) which help a little
      bit. We saw only a few percents improvement with that.
      
      By applying patch 2 as well, the single mutex_spin_on_owner
      figure is now split out into an additional mspin_lock figure.
      The time increases from 3.42% to 11.23%. It shows a great
      reduction in contention among the spinners leading to a 30%
      improvement. The time ratio 9.9/2.33=4.3 indicates that there
      are on average 4+ spinners waiting in the spin_lock loop for
      each spinner in the mutex_spin_on_owner loop. Contention in
      other locking functions also go down by quite a lot.
      
      The table below shows the performance change of both patches 1 &
      2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
      hyperthreading off).
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | alltests     |      0.0%     |     -0.8%      |     +0.6%       |
      | five_sec     |     -0.3%     |     +0.8%      |     +0.8%       |
      | high_systime |     +0.4%     |     +2.4%      |     +2.1%       |
      | new_fserver  |     +0.1%     |    +14.1%      |    +34.2%       |
      | shared       |     -0.5%     |     -0.3%      |     -0.4%       |
      | short        |     -1.7%     |     -9.8%      |     -8.3%       |
      +--------------+---------------+----------------+-----------------+
      
      The short workload is the only one that shows a decline in
      performance probably due to the spinner locking and queuing
      overhead.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Reviewed-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2bd2c92c
    • Waiman Long's avatar
      mutex: Make more scalable by doing less atomic operations · 0dc8c730
      Waiman Long authored
      In the __mutex_lock_common() function, an initial entry into
      the lock slow path will cause two atomic_xchg instructions to be
      issued. Together with the atomic decrement in the fast path, a
      total of three atomic read-modify-write instructions will be
      issued in rapid succession. This can cause a lot of cache
      bouncing when many tasks are trying to acquire the mutex at the
      same time.
      
      This patch will reduce the number of atomic_xchg instructions
      used by checking the counter value first before issuing the
      instruction. The atomic_read() function is just a simple memory
      read. The atomic_xchg() function, on the other hand, can be up
      to 2 order of magnitude or even more in cost when compared with
      atomic_read(). By using atomic_read() to check the value first
      before calling atomic_xchg(), we can avoid a lot of unnecessary
      cache coherency traffic. The only downside with this change is
      that a task on the slow path will have a tiny bit less chance of
      getting the mutex when competing with another task in the fast
      path.
      
      The same is true for the atomic_cmpxchg() function in the
      mutex-spin-on-owner loop. So an atomic_read() is also performed
      before calling atomic_cmpxchg().
      
      The mutex locking and unlocking code for the x86 architecture
      can allow any negative number to be used in the mutex count to
      indicate that some tasks are waiting for the mutex. I am not so
      sure if that is the case for the other architectures. So the
      default is to avoid atomic_xchg() if the count has already been
      set to -1. For x86, the check is modified to include all
      negative numbers to cover a larger case.
      
      The following table shows the jobs per minutes (JPM) scalability
      data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
      numactl command is used to restrict the running of the
      high_systime workloads to 1/2/4/8 nodes with hyperthreading on
      and off.
      
      +-----------------+-----------+------------+----------+
      |  Configuration  | Mean JPM  |  Mean JPM  | % Change |
      |		  | w/o patch | with patch |	      |
      +-----------------+-----------------------------------+
      |		  |      User Range 1100 - 2000	      |
      +-----------------+-----------------------------------+
      | 8 nodes, HT on  |    36980   |   148590  | +301.8%  |
      | 8 nodes, HT off |    42799   |   145011  | +238.8%  |
      | 4 nodes, HT on  |    61318   |   118445  |  +51.1%  |
      | 4 nodes, HT off |   158481   |   158592  |   +0.1%  |
      | 2 nodes, HT on  |   180602   |   173967  |   -3.7%  |
      | 2 nodes, HT off |   198409   |   198073  |   -0.2%  |
      | 1 node , HT on  |   149042   |   147671  |   -0.9%  |
      | 1 node , HT off |   126036   |   126533  |   +0.4%  |
      +-----------------+-----------------------------------+
      |		  |       User Range 200 - 1000	      |
      +-----------------+-----------------------------------+
      | 8 nodes, HT on  |   41525    |   122349  | +194.6%  |
      | 8 nodes, HT off |   49866    |   124032  | +148.7%  |
      | 4 nodes, HT on  |   66409    |   106984  |  +61.1%  |
      | 4 nodes, HT off |  119880    |   130508  |   +8.9%  |
      | 2 nodes, HT on  |  138003    |   133948  |   -2.9%  |
      | 2 nodes, HT off |  132792    |   131997  |   -0.6%  |
      | 1 node , HT on  |  116593    |   115859  |   -0.6%  |
      | 1 node , HT off |  104499    |   104597  |   +0.1%  |
      +-----------------+------------+-----------+----------+
      
      At low user range 10-100, the JPM differences were within +/-1%.
      So they are not that interesting.
      
      AIM7 benchmark run has a pretty large run-to-run variance due to
      random nature of the subtests executed. So a difference of less
      than +-5% may not be really significant.
      
      This patch improves high_systime workload performance at 4 nodes
      and up by maintaining transaction rates without significant
      drop-off at high node count.  The patch has practically no
      impact on 1 and 2 nodes system.
      
      The table below shows the percentage time (as reported by perf
      record -a -s -g) spent on the __mutex_lock_slowpath() function
      by the high_systime workload at 1500 users for 2/4/8-node
      configurations with hyperthreading off.
      
      +---------------+-----------------+------------------+---------+
      | Configuration | %Time w/o patch | %Time with patch | %Change |
      +---------------+-----------------+------------------+---------+
      |    8 nodes    |      65.34%     |      0.69%       |  -99%   |
      |    4 nodes    |       8.70%	  |      1.02%	     |  -88%   |
      |    2 nodes    |       0.41%     |      0.32%       |  -22%   |
      +---------------+-----------------+------------------+---------+
      
      It is obvious that the dramatic performance improvement at 8
      nodes was due to the drastic cut in the time spent within the
      __mutex_lock_slowpath() function.
      
      The table below show the improvements in other AIM7 workloads
      (at 8 nodes, hyperthreading off).
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | alltests     |     +0.6%     |   +104.2%      |   +185.9%       |
      | five_sec     |     +1.9%     |     +0.9%      |     +0.9%       |
      | fserver      |     +1.4%     |     -7.7%      |     +5.1%       |
      | new_fserver  |     -0.5%     |     +3.2%      |     +3.1%       |
      | shared       |    +13.1%     |   +146.1%      |   +181.5%       |
      | short        |     +7.4%     |     +5.0%      |     +4.2%       |
      +--------------+---------------+----------------+-----------------+
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Reviewed-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Norton: Scott J <scott.norton@hp.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0dc8c730
    • Waiman Long's avatar
      mutex: Move mutex spinning code from sched/core.c back to mutex.c · 41fcb9f2
      Waiman Long authored
      As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler
      feature bit was really just an early hack to make with/without
      mutex-spinning testable. So it is no longer necessary.
      
      This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and
      move the mutex spinning code from kernel/sched/core.c back to
      kernel/mutex.c which is where they should belong.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      41fcb9f2
  2. 10 Apr, 2013 1 commit
  3. 08 Apr, 2013 2 commits
  4. 07 Apr, 2013 7 commits
  5. 06 Apr, 2013 2 commits
    • Linus Torvalds's avatar
      Merge tag 'dm-3.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · fe696909
      Linus Torvalds authored
      Pull device-mapper fixes from Alasdair Kergon:
       "A pair of patches to fix the writethrough mode of the device-mapper
        cache target when the device being cached is not itself wrapped with
        device-mapper."
      
      * tag 'dm-3.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm cache: reduce bio front_pad size in writeback mode
        dm cache: fix writes to cache device in writethrough mode
      fe696909
    • Linus Torvalds's avatar
      Merge tag 'pci-v3.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b196553a
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "PCI updates for v3.9:
      
        ASPM
            Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
        kexec
            PCI: Don't try to disable Bus Master on disconnected PCI devices
        Platform ROM images
            PCI: Add PCI ROM helper for platform-provided ROM images
            nouveau: Attempt to use platform-provided ROM image
            radeon: Attempt to use platform-provided ROM image
        Hotplug
            PCI/ACPI: Always resume devices on ACPI wakeup notifications
            PCI/PM: Disable runtime PM of PCIe ports
        EISA
            EISA/PCI: Fix bus res reference
            EISA/PCI: Init EISA early, before PNP"
      
      * tag 'pci-v3.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/PM: Disable runtime PM of PCIe ports
        PCI/ACPI: Always resume devices on ACPI wakeup notifications
        PCI: Don't try to disable Bus Master on disconnected PCI devices
        Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
        radeon: Attempt to use platform-provided ROM image
        nouveau: Attempt to use platform-provided ROM image
        EISA/PCI: Init EISA early, before PNP
        EISA/PCI: Fix bus res reference
        PCI: Add PCI ROM helper for platform-provided ROM images
      b196553a
  6. 05 Apr, 2013 24 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 53f63189
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix erroneous sock_orphan() leading to crashes and double
          kfree_skb() in NFC protocol.  From Thierry Escande and Samuel Ortiz.
      
       2) Fix use after free in remain-on-channel mac80211 code, from Johannes
          Berg.
      
       3) nf_reset() needs to reset the NF tracing cookie, otherwise we can
          leak it from one namespace into another.  Fix from Gao Feng and
          Patrick McHardy.
      
       4) Fix overflow in channel scanning array of mwifiex driver, from Stone
          Piao.
      
       5) Fix loss of link after suspend/shutdown in r8169, from Hayes Wang.
      
       6) Synchronization of unicast address lists to the undelying device
          doesn't work because whether to sync is maintained as a boolean
          rather than a true count.  Fix from Vlad Yasevich.
      
       7) Fix corruption of TSO packets in atl1e by limiting the segmented
          packet length.  From Hannes Frederic Sowa.
      
       8) Revert bogus AF_UNIX credential passing change and fix the
          coalescing issue properly, from Eric W Biederman.
      
       9) Changes of ipv4 address lifetime settings needs to generate a
          notification, from Jiri Pirko.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
        netfilter: don't reset nf_trace in nf_reset()
        net: ipv4: notify when address lifetime changes
        ixgbe: fix registration order of driver and DCA nofitication
        af_unix: If we don't care about credentials coallesce all messages
        Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"
        bonding: remove sysfs before removing devices
        atl1e: limit gso segment size to prevent generation of wrong ip length fields
        net: count hw_addr syncs so that unsync works properly.
        r8169: fix auto speed down issue
        netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths
        mwifiex: limit channel number not to overflow memory
        NFC: microread: Fix build failure due to a new MEI bus API
        iwlwifi: dvm: fix the passive-no-RX workaround
        netfilter: nf_conntrack: fix error return code
        NFC: llcp: Keep the connected socket parent pointer alive
        mac80211: fix idle handling sequence
        netfilter: nfnetlink_acct: return -EINVAL if object name is empty
        netfilter: nfnetlink_queue: fix error return code in nfnetlink_queue_init()
        netfilter: reset nf_trace in nf_reset
        mac80211: fix remain-on-channel cancel crash
        ...
      53f63189
    • Jan Beulich's avatar
      x86: Fix rebuild with EFI_STUB enabled · 91870824
      Jan Beulich authored
      eboot.o and efi_stub_$(BITS).o didn't get added to "targets", and hence
      their .cmd files don't get included by the build machinery, leading to
      the files always getting rebuilt.
      
      Rather than adding the two files individually, take the opportunity and
      add $(VMLINUX_OBJS) to "targets" instead, thus allowing the assignment
      at the top of the file to be shrunk quite a bit.
      
      At the same time, remove a pointless flags override line - the variable
      assigned to was misspelled anyway, and the options added are
      meaningless for assembly sources.
      
      [ hpa: the patch is not minimal, but I am taking it for -urgent anyway
        since the excess impact of the patch seems to be small enough. ]
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/515C5D2502000078000CA6AD@nat28.tlf.novell.com
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      91870824
    • Patrick McHardy's avatar
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy authored
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      124dff01
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 6cfa9238
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "Fixes for a number of small glitches in various corners of the MIPS
        tree.  No particular areas is standing out.
      
        With this applied all MIPS defconfigs are building fine.  No merge
        conflicts are expected."
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: Delete definition of SA_RESTORER.
        MIPS: Fix ISA level which causes secondary cache init bypassing and more
        MIPS: Fix build error cavium-octeon without CONFIG_SMP
        MIPS: Kconfig: Rename SNIPROM too
        MIPS: Alchemy: Fix typo "CONFIG_DEBUG_PCI"
        MIPS: Unbreak function tracer for 64-bit kernel.
      6cfa9238
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes · 00fa6fe9
      Linus Torvalds authored
      Pull GFS2 fixes from Steven Whitehouse:
       "There are two patches which fix up a couple of minor issues in the DLM
        interface code, a missing error path in gfs2_rs_alloc(), one patch
        which fixes a problem during "withdraw" and a fix for discards/FITRIM
        when using 4k sector sized devices."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
        GFS2: Issue discards in 512b sectors
        GFS2: Fix unlock of fcntl locks during withdrawn state
        GFS2: return error if malloc failed in gfs2_rs_alloc()
        GFS2: use memchr_inv
        GFS2: use kmalloc for lvb bitmap
      00fa6fe9
    • Mike Marciniszyn's avatar
      firmware,IB/qib: revert firmware file move · ff802e31
      Mike Marciniszyn authored
      Commit e2eed58b ("IB/qib: change QLogic to Intel") moved a firmware
      file potentially breaking the ABI.
      
      This patch reverts that aspect of the fix as well as reverting the
      firmware name as used in qib.
      Reported-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff802e31
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc · e0a77f26
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A bunch of small driver fixes plus a fix for error handling in the
        core - nothing too exciting overall."
      
      * tag 'spi-fix-v3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
        spi/mpc512x-psc: optionally keep PSC SS asserted across xfer segmensts
        spi: Unlock a spinlock before calling into the controller driver.
        spi/s3c64xx: modified error interrupt handling and init
        spi/bcm63xx: don't disable non enabled clocks in probe error path
        spi/bcm63xx: Remove unused variable
        spi: slink-tegra20: move runtime pm calls to transfer_one_message
      e0a77f26
    • Bob Peterson's avatar
      GFS2: Issue discards in 512b sectors · b2c87cae
      Bob Peterson authored
      This patch changes GFS2's discard issuing code so that it calls
      function sb_issue_discard rather than blkdev_issue_discard. The
      code was calling blkdev_issue_discard and specifying the correct
      sector offset and sector size, but blkdev_issue_discard expects
      these values to be in terms of 512 byte sectors, even if the native
      sector size for the device is different. Calling sb_issue_discard
      with the BLOCK size instead ensures the correct block-to-512b-sector
      translation. I verified that "minlen" is specified in blocks, so
      comparing it to a number of blocks is correct.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      b2c87cae
    • Johan Hovold's avatar
      Revert "drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR" · e24b0bfa
      Johan Hovold authored
      This reverts commit 0ef1594c.
      
      This patch introduced a few races which cannot be easily fixed with a
      small follow-up patch. Furthermore, the SoC with the broken hardware
      register, which this patch intended to add support for, can only be used
      with device trees, which this driver currently does not support.
      
      [ Here is the discussion that led to this "revert" patch:
        https://lkml.org/lkml/2013/4/3/176 ]
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e24b0bfa
    • Linus Torvalds's avatar
      Merge tag 'fbdev-fixes-3.9-rc6' of git://gitorious.org/linux-omap-dss2/linux · c4c80f63
      Linus Torvalds authored
      Pull fbdev fixes from Tomi Valkeinen:
       "Fix uvesafb crash bug and typoed flag name in fbmon's new videomode
        code"
      
      * tag 'fbdev-fixes-3.9-rc6' of git://gitorious.org/linux-omap-dss2/linux:
        video:uvesafb: Fix dereference NULL pointer code path
        fbmon: use VESA_DMT_VSYNC_HIGH to fix typo
      c4c80f63
    • Linus Torvalds's avatar
      Merge tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 8f09aacf
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains slightly more volumes than usual at this stage, mostly
        because of my vacation in the last week.  Nothing to scare, all small
        and/or trivial fixes:
      
         - Fix loop path handling in ASoC DAPM
         - Some memory handling fixes in ASoC core
         - Fix spear_pcm to adapt to the updated API
         - HD-audio HDMI ELD handling fixes
         - Fix for CM6331 USB-audio SRC change bugs
         - Revert power_save_controller option change due to user-space usage
         - A few other small ASoC and HD-audio fixes"
      
      * tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/generic - fix uninitialized variable
        Revert "ALSA: hda - Allow power_save_controller option override DCAPS"
        ALSA: hda - fix typo in proc output
        ALSA: hda - Enabling Realtek ALC 671 codec
        ALSA: usb: Work around CM6631 sample rate change bug
        ALSA: hda - bug fix on HDMI ELD debug message
        ALSA: hda - bug fix on return value when getting HDMI ELD info
        ASoC: dma-sh7760: Fix compile error
        ASoC: core: fix invalid free of devm_ allocated data
        ASoC: spear_pcm: Update to new pcm_new() API
        ASoC:: max98090: Remove executable bit
        ASoC: dapm: Fix pointer dereference in is_connected_output_ep()
        ASoC: pcm030 audio fabric: remove __init from probe
        ASoC: imx-ssi: Fix occasional AC97 reset failure
        ASoC: core: fix possible memory leak in snd_soc_bytes_put()
        ASoC: wm_adsp: fix possible memory leak in wm_adsp_load_coeff()
        ASoC: dapm: Fix handling of loops
        ASoC: si476x: Add missing break for SNDRV_PCM_FORMAT_S8 switch case
      8f09aacf
    • Mike Snitzer's avatar
      dm cache: reduce bio front_pad size in writeback mode · 19b0092e
      Mike Snitzer authored
      A recent patch to fix the dm cache target's writethrough mode extended
      the bio's front_pad to include a 1056-byte struct dm_bio_details.
      Writeback mode doesn't need this, so this patch reduces the
      per_bio_data_size to 16 bytes in this case instead of 1096.
      
      The dm_bio_details structure was added in "dm cache: fix writes to
      cache device in writethrough mode" which fixed commit e2e74d61 ("dm
      cache: fix race in writethrough implementation").  In writeback mode
      we avoid allocating the writethrough-specific members of the
      per_bio_data structure (the dm_bio_details structure included).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      19b0092e
    • Darrick J. Wong's avatar
      dm cache: fix writes to cache device in writethrough mode · b844fe69
      Darrick J. Wong authored
      The dm-cache writethrough strategy introduced by commit e2e74d61
      ("dm cache: fix race in writethrough implementation") issues a bio to
      the origin device, remaps and then issues the bio to the cache device.
      This more conservative in-series approach was selected to favor
      correctness over performance (of the previous parallel writethrough).
      However, this in-series implementation that reuses the same bio to write
      both the origin and cache device didn't take into account that the block
      layer's req_bio_endio() modifies a completing bio's bi_sector and
      bi_size.  So the new writethrough strategy needs to preserve these bio
      fields, and restore them before submission to the cache device,
      otherwise nothing gets written to the cache (because bi_size is 0).
      
      This patch adds a struct dm_bio_details field to struct per_bio_data,
      and uses dm_bio_record() and dm_bio_restore() to ensure the bio is
      restored before reissuing to the cache device.  Adding such a large
      structure to the per_bio_data is not ideal but we can improve this
      later, for now correctness is the important thing.
      
      This problem initially went unnoticed because the dm-cache test-suite
      uses a linear DM device for the dm-cache device's origin device.
      Writethrough worked as expected because DM submits a *clone* of the
      original bio, so the original bio which was reused for the cache was
      never touched.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      b844fe69
    • Ralf Baechle's avatar
      MIPS: Delete definition of SA_RESTORER. · 80fa8181
      Ralf Baechle authored
      SA_RESTORER used to be defined as 0x04000000 but only the O32 ABI ever
      supported its use and no libc was using it, so the entire sa-restorer
      functionality was removed with lmo commit 39bffc12c3580ab [Zap sa_restorer.]
      for 2.5.48 retaining only the SA_RESTORER definition as a reminder to avoid
      accidental reuse of the mask bit.
      
      Upstream cdef9602 [signal: always clear
      sa_restorer on execve] adds code that assumes sa_sigaction has an
      sa_restorer field, if SA_RESTORER is defined which would break MIPS.
      So remove the SA_RESTORER definition before the v3.8.4 merge.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      (cherry picked from commit 17da8d63add23830892ac4dc2cbb3b5d4ffb79a8)
      80fa8181
    • Deng-Cheng Zhu's avatar
      MIPS: Fix ISA level which causes secondary cache init bypassing and more · adb37892
      Deng-Cheng Zhu authored
      The commit a96102be introduced set_isa() where compatible ISA info is
      also set aside from the one gets passed in. It means, for example, 1004K
      will have MIPS_CPU_ISA_M32R2/M32R1/II/I flags. This leads to things like
      the following inappropriate:
      
      if (c->isa_level == MIPS_CPU_ISA_M32R1 ||
          c->isa_level == MIPS_CPU_ISA_M32R2 ||
          c->isa_level == MIPS_CPU_ISA_M64R1 ||
          c->isa_level == MIPS_CPU_ISA_M64R2)
      
      This patch fixes it.
      Signed-off-by: default avatarDeng-Cheng Zhu <dengcheng.zhu@imgtec.com>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      adb37892
    • EunBong Song's avatar
      MIPS: Fix build error cavium-octeon without CONFIG_SMP · ed1197f9
      EunBong Song authored
      Singed-off-by: default avatarEunBong Song <eunb.song@samsung.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      ed1197f9
    • Paul Bolle's avatar
      MIPS: Kconfig: Rename SNIPROM too · aaa9fad3
      Paul Bolle authored
      CONFIG_SNIPROM was renamed to CONFIG_FW_SNIPROM in v3.8. Let's rename
      SNIPROM itself too.
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Cc: linux-mips@linux-mips.org;
      Cc: linux-kernel@vger.kernel.org
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      aaa9fad3
    • Paul Bolle's avatar
      MIPS: Alchemy: Fix typo "CONFIG_DEBUG_PCI" · 143f0f65
      Paul Bolle authored
      Commit 7517de34 ("MIPS: Alchemy: Redo
      PCI as platform driver") added a reference to CONFIG_DEBUG_PCI. Change
      it to CONFIG_PCI_DEBUG, as that is a valid Kconfig macro.
      
      Also add a newline to a debugging printk that this fix enables.
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      143f0f65
    • David Daney's avatar
      MIPS: Unbreak function tracer for 64-bit kernel. · ad8c3969
      David Daney authored
      Commit 58b69401 [MIPS: Function tracer: Fix broken function tracing]
      completely broke the function tracer for 64-bit kernels.  The symptom is
      a system hang very early in the boot process.
      
      The fix: Remove/fix $sp adjustments for 64-bit case.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: Al Cooper <alcooperx@gmail.com>
      Cc: viric@viric.name
      Cc: stable@vger.kernel.org # 3.8.x
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      ad8c3969
    • Jiri Slaby's avatar
      ALSA: hda/generic - fix uninitialized variable · 868211db
      Jiri Slaby authored
      changed is not initialized in path_power_down_sync, but it is expected
      to be false in case no change happened in the loop. So set it to
      false.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      868211db
    • Jiri Pirko's avatar
      net: ipv4: notify when address lifetime changes · 34e2ed34
      Jiri Pirko authored
      if userspace changes lifetime of address, send netlink notification and
      call notifier.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34e2ed34
    • Jakub Kicinski's avatar
      ixgbe: fix registration order of driver and DCA nofitication · f01fc1a8
      Jakub Kicinski authored
      ixgbe_notify_dca cannot be called before driver registration
      because it expects driver's klist_devices to be allocated and
      initialized. While on it make sure debugfs files are removed
      when registration fails.
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f01fc1a8
    • Eric W. Biederman's avatar
      af_unix: If we don't care about credentials coallesce all messages · 0e82e7f6
      Eric W. Biederman authored
      It was reported that the following LSB test case failed
      https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
      were not coallescing unix stream messages when the application was
      expecting us to.
      
      The problem was that the first send was before the socket was accepted
      and thus sock->sk_socket was NULL in maybe_add_creds, and the second
      send after the socket was accepted had a non-NULL value for sk->socket
      and thus we could tell the credentials were not needed so we did not
      bother.
      
      The unnecessary credentials on the first message cause
      unix_stream_recvmsg to start verifying that all messages had the same
      credentials before coallescing and then the coallescing failed because
      the second message had no credentials.
      
      Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
      long standing pessimization which would fail to coallesce messages when
      reading from a unix stream socket if the senders were different even if
      we did not care about their credentials.
      
      I have tested this and verified that the in the LSB test case mentioned
      above that the messages do coallesce now, while the were failing to
      coallesce without this change.
      Reported-by: default avatarKarel Srot <ksrot@redhat.com>
      Reported-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e82e7f6
    • Eric W. Biederman's avatar
      Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL" · 25da0e3e
      Eric W. Biederman authored
      This reverts commit 14134f65.
      
      The problem that the above patch was meant to address is that af_unix
      messages are not being coallesced because we are sending unnecesarry
      credentials.  Not sending credentials in maybe_add_creds totally
      breaks unconnected unix domain sockets that wish to send credentails
      to other sockets.
      
      In practice this break some versions of udev because they receive a
      message and the sending uid is bogus so they drop the message.
      Reported-by: default avatarSven Joachim <svenjoac@gmx.de>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25da0e3e