1. 12 Feb, 2014 1 commit
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 45f7fdc2
      Linus Torvalds authored
      Pull powerpc updates from Ben Herrenschmidt:
       "Here is some powerpc goodness for -rc2.  Arguably -rc1 material more
        than -rc2 but I was travelling (again !)
      
        It's mostly bug fixes including regressions, but there are a couple of
        new things that I decided to drop-in.
      
        One is a straightforward patch from Michael to add a bunch of P8 cache
        events to perf.
      
        The other one is a patch by myself to add the direct DMA (iommu
        bypass) for PCIe on Power8 for 64-bit capable devices.  This has been
        around for a while, I had lost track of it.  However it's been in our
        internal kernels we use for testing P8 already and it affects only P8
        related code.  Since P8 is still unreleased the risk is pretty much
        nil at this point"
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/powernv: Add iommu DMA bypass support for IODA2
        powerpc: Fix endian issues in kexec and crash dump code
        powerpc/ppc32: Fix the bug in the init of non-base exception stack for UP
        powerpc/xmon: Don't signal we've entered until we're finished printing
        powerpc/xmon: Fix timeout loop in get_output_lock()
        powerpc/xmon: Don't loop forever in get_output_lock()
        powerpc/perf: Configure BHRB filter before enabling PMU interrupts
        crypto/nx/nx-842: Fix handling of vmalloc addresses
        powerpc/pseries: Select ARCH_RANDOM on pseries
        powerpc/perf: Add Power8 cache & TLB events
        powerpc/relocate fix relocate processing in LE mode
        powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.
        powerpc/pseries: Disable relocation on exception while going down during crash.
        powerpc/eeh: Drop taken reference to driver on eeh_rmv_device
        powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=y
      45f7fdc2
  2. 11 Feb, 2014 39 commits
    • Linus Torvalds's avatar
      Merge tag 'dt-fixes-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · bbb19555
      Linus Torvalds authored
      Pull DeviceTree fixes from Rob Herring:
      
       - Fix compile error drivers/spi/spi-rspi.c with !CONFIG_OF
       - Fix warnings for unused/uninitialized variables with !CONFIG_OF
       - Fix PCIe bus matching for powerpc
       - Add documentation for various vendor strings
      
      * tag 'dt-fixes-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        DT: Add vendor prefix for Spansion Inc.
        of/device: Nullify match table in of_match_device() for CONFIG_OF=n
        dt-bindings: add vendor-prefix for neonode
        of: fix PCI bus match for PCIe slots
        of: restructure for_each macros to fix compile warnings
        of: add vendor prefix for Honeywell
        of: Update qcom vendor prefix description
        of: add vendor prefix for Allwinner Technology
      bbb19555
    • Linus Torvalds's avatar
      Merge tag 'microblaze-3.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze · 738b52bb
      Linus Torvalds authored
      Pull microblaze fixes from Michal Simek:
       - Fix two compilation issues - HZ, readq/writeq
       - Fix stack protection support
      
      * tag 'microblaze-3.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Fix a typo when disabling stack protection
        microblaze: Define readq and writeq IO helper function
        microblaze: Fix missing HZ macro
      738b52bb
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · a87af778
      Linus Torvalds authored
      Pull s390 bugfixes from Martin Schwidefsky:
       "A collection a bug fixes.  Most of them are minor but two of them are
        more severe.  The linkage stack bug can be used by user space to force
        an oops, with panic_on_oops this is a denial-of-service.  And the dump
        memory detection issue can cause incomplete memory dumps"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/cio: improve cio_commit_config
        s390: fix kernel crash due to linkage stack instructions
        s390/dump: Fix dump memory detection
        s390/appldata: restore missing init_virt_timer()
        s390/qdio: correct program-controlled interruption checking
        s390/qdio: for_each macro correctness
      a87af778
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 16e5a2ed
      Linus Torvalds authored
      Pull networking updates from David Miller:
      
       1) Fix flexcan build on big endian, from Arnd Bergmann
      
       2) Correctly attach cpsw to GPIO bitbang MDIO drive, from Stefan Roese
      
       3) udp_add_offload has to use GFP_ATOMIC since it can be invoked from
          non-sleepable contexts.  From Or Gerlitz
      
       4) vxlan_gro_receive() does not iterate over all possible flows
          properly, fix also from Or Gerlitz
      
       5) CAN core doesn't use a proper SKB destructor when it hooks up
          sockets to SKBs.  Fix from Oliver Hartkopp
      
       6) ip_tunnel_xmit() can use an uninitialized route pointer, fix from
          Eric Dumazet
      
       7) Fix address family assignment in IPVS, from Michal Kubecek
      
       8) Fix ath9k build on ARM, from Sujith Manoharan
      
       9) Make sure fail_over_mac only applies for the correct bonding modes,
          from Ding Tianhong
      
      10) The udp offload code doesn't use RCU correctly, from Shlomo Pongratz
      
      11) Handle gigabit features properly in generic PHY code, from Florian
          Fainelli
      
      12) Don't blindly invoke link operations in
          rtnl_link_get_slave_info_data_size, they are optional.  Fix from
          Fernando Luis Vazquez Cao
      
      13) Add USB IDs for Netgear Aircard 340U, from Bjørn Mork
      
      14) Handle netlink packet padding properly in openvswitch, from Thomas
          Graf
      
      15) Fix oops when deleting chains in nf_tables, from Patrick McHardy
      
      16) Fix RX stalls in xen-netback driver, from Zoltan Kiss
      
      17) Fix deadlock in mac80211 stack, from Emmanuel Grumbach
      
      18) inet_nlmsg_size() forgets to consider ifa_cacheinfo, fix from Geert
          Uytterhoeven
      
      19) tg3_change_mtu() can deadlock, fix from Nithin Sujir
      
      20) Fix regression in setting SCTP local source addresses on accepted
          sockets, caused by some generic ipv6 socket changes.  Fix from
          Matija Glavinic Pecotic
      
      21) IPPROTO_* must be pure defines, otherwise module aliases don't get
          constructed properly.  Fix from Jan Moskyto
      
      22) IPV6 netconsole setup doesn't work properly unless an explicit
          source address is specified, fix from Sabrina Dubroca
      
      23) Use __GFP_NORETRY for high order skb page allocations in
          sock_alloc_send_pskb and skb_page_frag_refill.  From Eric Dumazet
      
      24) Fix a regression added in netconsole over bridging, from Cong Wang
      
      25) TCP uses an artificial offset of 1ms for SRTT, but this doesn't jive
          well with TCP pacing which needs the SRTT to be accurate.  Fix from
          Eric Dumazet
      
      26) Several cases of missing header file includes from Rashika Kheria
      
      27) Add ZTE MF667 device ID to qmi_wwan driver, from Raymond Wanyoike
      
      28) TCP Small Queues doesn't handle nonagle properly in some corner
          cases, fix from Eric Dumazet
      
      29) Remove extraneous read_unlock in bond_enslave, whoops.  From Ding
          Tianhong
      
      30) Fix 9p trans_virtio handling of vmalloc buffers, from Richard Yao
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (136 commits)
        6lowpan: fix lockdep splats
        alx: add missing stats_lock spinlock init
        9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
        bonding: remove unwanted bond lock for enslave processing
        USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support
        tcp: tsq: fix nonagle handling
        bridge: Prevent possible race condition in br_fdb_change_mac_address
        bridge: Properly check if local fdb entry can be deleted when deleting vlan
        bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port
        bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address
        bridge: Fix the way to check if a local fdb entry can be deleted
        bridge: Change local fdb entries whenever mac address of bridge device changes
        bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address
        bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
        bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
        tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min
        net: vxge: Remove unused device pointer
        net: qmi_wwan: add ZTE MF667
        3c59x: Remove unused pointer in vortex_eisa_cleanup()
        net: fix 'ip rule' iif/oif device rename
        ...
      16e5a2ed
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv: Add iommu DMA bypass support for IODA2 · cd15b048
      Benjamin Herrenschmidt authored
      This patch adds the support for to create a direct iommu "bypass"
      window on IODA2 bridges (such as Power8) allowing to bypass iommu
      page translation completely for 64-bit DMA capable devices, thus
      significantly improving DMA performances.
      
      Additionally, this adds a hook to the struct iommu_table so that
      the IOMMU API / VFIO can disable the bypass when external ownership
      is requested, since in that case, the device will be used by an
      environment such as userspace or a KVM guest which must not be
      allowed to bypass translations.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cd15b048
    • Eric Dumazet's avatar
      6lowpan: fix lockdep splats · 20e7c4e8
      Eric Dumazet authored
      When a device ndo_start_xmit() calls again dev_queue_xmit(),
      lockdep can complain because dev_queue_xmit() is re-entered and the
      spinlocks protecting tx queues share a common lockdep class.
      
      Same issue was fixed for bonding/l2tp/ppp in commits
      
      0daa2303 ("[PATCH] bonding: lockdep annotation")
      49ee4920 ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat")
      23d3b8bf ("net: qdisc busylock needs lockdep annotations ")
      303c07db ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ")
      Reported-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20e7c4e8
    • John Greene's avatar
      alx: add missing stats_lock spinlock init · 3e5ccc29
      John Greene authored
      Trivial fix for init time stack trace occuring in
      alx_get_stats64 upon start up. Should have been part of
      commit adding the spinlock:
      f1b6b106 alx: add alx_get_stats64 operation
      Signed-off-by: default avatarJohn Greene <jogreene@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e5ccc29
    • Richard Yao's avatar
      9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers · b6f52ae2
      Richard Yao authored
      The 9p-virtio transport does zero copy on things larger than 1024 bytes
      in size. It accomplishes this by returning the physical addresses of
      pages to the virtio-pci device. At present, the translation is usually a
      bit shift.
      
      That approach produces an invalid page address when we read/write to
      vmalloc buffers, such as those used for Linux kernel modules. Any
      attempt to load a Linux kernel module from 9p-virtio produces the
      following stack.
      
      [<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510
      [<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
      [<ffffffff814839dd>] p9_client_read+0x15d/0x240
      [<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0
      [<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20
      [<ffffffff811c84e7>] v9fs_file_read+0x37/0x70
      [<ffffffff8114e3fb>] vfs_read+0x9b/0x160
      [<ffffffff81153571>] kernel_read+0x41/0x60
      [<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180
      
      Subsequently, QEMU will die printing:
      
      qemu-system-x86_64: virtio: trying to map MMIO memory
      
      This patch enables 9p-virtio to correctly handle this case. This not
      only enables us to load Linux kernel modules off virtfs, but also
      enables ZFS file-based vdevs on virtfs to be used without killing QEMU.
      
      Special thanks to both Avi Kivity and Alexander Graf for their
      interpretation of QEMU backtraces. Without their guidence, tracking down
      this bug would have taken much longer. Also, special thanks to Linus
      Torvalds for his insightful explanation of why this should use
      is_vmalloc_addr() instead of is_vmalloc_or_module_addr():
      
      https://lkml.org/lkml/2014/2/8/272Signed-off-by: default avatarRichard Yao <ryao@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6f52ae2
    • dingtianhong's avatar
      bonding: remove unwanted bond lock for enslave processing · 6b8790b5
      dingtianhong authored
      The bond enslave processing don't hold bond->lock anymore,
      so release an unlocked rw lock will cause warning message,
      remove the unwanted read_unlock(&bond->lock).
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b8790b5
    • Liu Junliang's avatar
    • Anton Blanchard's avatar
      powerpc: Fix endian issues in kexec and crash dump code · ea961a82
      Anton Blanchard authored
      We expose a number of OF properties in the kexec and crash dump code
      and these need to be big endian.
      
      Cc: stable@vger.kernel.org # v3.13
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea961a82
    • Kevin Hao's avatar
      powerpc/ppc32: Fix the bug in the init of non-base exception stack for UP · 04a34113
      Kevin Hao authored
      We would allocate one specific exception stack for each kind of
      non-base exceptions for every CPU. For ppc32 the CPU hard ID is
      used as the subscript to get the specific exception stack for
      one CPU. But for an UP kernel, there is only one element in the
      each kind of exception stack array. We would get stuck if the
      CPU hard ID is not equal to '0'. So in this case we should use the
      subscript '0' no matter what the CPU hard ID is.
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      04a34113
    • Michael Ellerman's avatar
      powerpc/xmon: Don't signal we've entered until we're finished printing · d2b496e5
      Michael Ellerman authored
      Currently we set our cpu's bit in cpus_in_xmon, and then we take the
      output lock and print the exception information.
      
      This can race with the master cpu entering the command loop and printing
      the backtrace. The result is that the backtrace gets garbled with
      another cpu's exception print out.
      
      Fix it by delaying the set of cpus_in_xmon until we are finished
      printing.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d2b496e5
    • Michael Ellerman's avatar
      powerpc/xmon: Fix timeout loop in get_output_lock() · 15075897
      Michael Ellerman authored
      As far as I can tell, our 70s era timeout loop in get_output_lock() is
      generating no code.
      
      This leads to the hostile takeover happening more or less simultaneously
      on all cpus. The result is "interesting", some example output that is
      more readable than most:
      
          cpu 0x1: Vector: 100 (Scypsut e0mx bR:e setV)e catto xc0p:u[ c 00
          c0:0  000t0o0V0erc0td:o5 rfc28050000]0c00 0 0  0 6t(pSrycsV1ppuot
          uxe 1m 2 0Rx21e3:0s0ce000c00000t00)00 60602oV2SerucSayt0y 0p 1sxs
      
      Fix it by using udelay() in the timeout loop. The wait time and check
      frequency are arbitrary, but seem to work OK. We already rely on
      udelay() working so this is not a new dependency.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      15075897
    • Michael Ellerman's avatar
      powerpc/xmon: Don't loop forever in get_output_lock() · 730efb61
      Michael Ellerman authored
      If we enter with xmon_speaker != 0 we skip the first cmpxchg(), we also
      skip the while loop because xmon_speaker != last_speaker (0) - meaning we
      skip the second cmpxchg() also.
      
      Following that code path the compiler sees no memory barriers and so is
      within its rights to never reload xmon_speaker. The end result is we loop
      forever.
      
      This manifests as all cpus being in xmon ('c' command), but they refuse
      to take control when you switch to them ('c x' for cpu # x).
      
      I have seen this deadlock in practice and also checked the generated code to
      confirm this is what's happening.
      
      The simplest fix is just to always try the cmpxchg().
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      730efb61
    • Anshuman Khandual's avatar
      powerpc/perf: Configure BHRB filter before enabling PMU interrupts · b4d6c06c
      Anshuman Khandual authored
      Right now the config_bhrb() PMU specific call happens after
      write_mmcr0(), which actually enables the PMU for event counting and
      interrupts. So there is a small window of time where the PMU and BHRB
      runs without the required HW branch filter (if any) enabled in BHRB.
      
      This can cause some of the branch samples to be collected through BHRB
      without any filter applied and hence affects the correctness of
      the results. This patch moves the BHRB config function call before
      enabling interrupts.
      
      Here are some data points captured via trace prints which depicts how we
      could get PMU interrupts with BHRB filter NOT enabled with a standard
      perf record command line (asking for branch record information as well).
      
          $ perf record -j any_call ls
      
      Before the patch:-
      
          ls-1962  [003] d...  2065.299590: .perf_event_interrupt: MMCRA: 40000000000
          ls-1962  [003] d...  2065.299603: .perf_event_interrupt: MMCRA: 40000000000
          ...
      
          All the PMU interrupts before this point did not have the requested
          HW branch filter enabled in the MMCRA.
      
          ls-1962  [003] d...  2065.299647: .perf_event_interrupt: MMCRA: 40040000000
          ls-1962  [003] d...  2065.299662: .perf_event_interrupt: MMCRA: 40040000000
      
      After the patch:-
      
          ls-1850  [008] d...   190.311828: .perf_event_interrupt: MMCRA: 40040000000
          ls-1850  [008] d...   190.311848: .perf_event_interrupt: MMCRA: 40040000000
      
          All the PMU interrupts have the requested HW BHRB branch filter
          enabled in MMCRA.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      [mpe: Fixed up whitespace and cleaned up changelog]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b4d6c06c
    • Nathan Fontenot's avatar
      crypto/nx/nx-842: Fix handling of vmalloc addresses · 0ba3e101
      Nathan Fontenot authored
      The powerpc specific nx-842 compression driver does not currently
      handle translating a vmalloc address to a physical address.
      
      The current driver uses __pa() for all addresses which does not
      properly handle vmalloc addresses and thus causes a failure since
      we do not pass a proper physical address to the hypervisor.
      
      This patch adds a routine to convert an address to a physical
      address by checking for vmalloc addresses and handling them properly.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
       ---
       drivers/crypto/nx/nx-842.c |   29 +++++++++++++++++++----------
       1 file changed, 19 insertions(+), 10 deletions(-)
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0ba3e101
    • Michael Ellerman's avatar
      powerpc/pseries: Select ARCH_RANDOM on pseries · 8d4887ee
      Michael Ellerman authored
      We have a driver for the ARCH_RANDOM hook in rng.c, so we should select
      ARCH_RANDOM on pseries.
      
      Without this the build breaks if you turn ARCH_RANDOM off.
      
      This hasn't broken the build because pseries_defconfig doesn't specify a
      value for PPC_POWERNV, which is default y, and selects ARCH_RANDOM.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d4887ee
    • Michael Ellerman's avatar
    • Laurent Dufour's avatar
      powerpc/relocate fix relocate processing in LE mode · 3b830c82
      Laurent Dufour authored
      Relocation's code is not working in little endian mode because the r_info
      field, which is a 64 bits value, should be read from the right offset.
      
      The current code is optimized to read the r_info field as a 32 bits value
      starting at the middle of the double word (offset 12). When running in LE
      mode, the read value is not correct since only the MSB is read.
      
      This patch removes this optimization which consist to deal with a 32 bits
      value instead of a 64 bits one. This way it works in big and little endian
      mode.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3b830c82
    • Mahesh Salgaonkar's avatar
      powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. · 429d2e83
      Mahesh Salgaonkar authored
      On p8 systems, with relocation on exception feature enabled we are seeing
      kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this
      feature enabled, exception are raised with MMU (IR=DR=1) ON with the
      default offset of 0xc*4000. Since exception is raised in virtual mode it
      requires the vector region to be executable without which it fails to
      fetch and execute instruction at 0xc*4xxx. For default kernel since kernel
      is loaded at real 0, the htab mappings sets the entire kernel text region
      executable. But for relocatable kernel (e.g. kdump case) we only copy
      interrupt vectors down to real 0 and never marked that region as
      executable because in p7 and below we always get exception in real mode.
      
      This patch fixes this issue by marking htab mapping range as executable
      that overlaps with the interrupt vector region for relocatable kernel.
      
      Thanks to Ben who helped me to debug this issue and find the root cause.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      429d2e83
    • Mahesh Salgaonkar's avatar
      powerpc/pseries: Disable relocation on exception while going down during crash. · 3ec8b78f
      Mahesh Salgaonkar authored
      Disable relocation on exception while going down even in kdump case. This
      is because we are about clear htab mappings while kexec-ing into kdump
      kernel and we may run into issues if we still have AIL ON.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ec8b78f
    • Thadeu Lima de Souza Cascardo's avatar
      powerpc/eeh: Drop taken reference to driver on eeh_rmv_device · 8cc6b6cd
      Thadeu Lima de Souza Cascardo authored
      Commit f5c57710 ("powerpc/eeh: Use
      partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
      which may grab a reference to a driver, but not release it.
      
      That prevents a driver from being removed after it has gone through EEH
      recovery.
      
      This patch drops the reference if it was taken.
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Acked-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8cc6b6cd
    • Paul Gortmaker's avatar
      powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=y · 0215b4aa
      Paul Gortmaker authored
      Commit 446f6d06 ("powerpc/mpic: Properly
      set default triggers") breaks the mpc7447_hpc_defconfig as follows:
      
        CC      arch/powerpc/sysdev/mpic.o
      arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type':
      arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant
      arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant
      arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant
      arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant
      
      Looking at the cpp output (gcc 4.7.3), I see:
      
         case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] |
              mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]:
      
      The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set
      for this platform, thus enabling the following:
      
        -------------------
        #ifdef CONFIG_MPIC_WEIRD
        static u32 mpic_infos[][MPIC_IDX_END] = {
              [0] = { /* Original OpenPIC compatible MPIC */
      
        [...]
      
        #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name]
      
        #else /* CONFIG_MPIC_WEIRD */
      
        #define MPIC_INFO(name) MPIC_##name
      
        #endif /* CONFIG_MPIC_WEIRD */
        -------------------
      
      Here we convert the case section to if/else if, and also add
      the equivalent of a default case to warn about unknown types.
      Boot tested on sbc8548, build tested on all defconfigs.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0215b4aa
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew Morton) · 6792dfe3
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "A bunch of fixes"
      
      * emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
        ocfs2: check existence of old dentry in ocfs2_link()
        ocfs2: update inode size after zeroing the hole
        ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
        mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED
        smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls
        mm: fix page leak at nfs_symlink()
        slub: do not assert not having lock in removing freed partial
        gitignore: add all.config
        ocfs2: fix ocfs2_sync_file() if filesystem is readonly
        drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
        fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
        xen: properly account for _PAGE_NUMA during xen pte translations
        mm/slub.c: list_lock may not be held in some circumstances
        drivers/md/bcache/extents.c: use %zi to format size_t
        vmcore: prevent PT_NOTE p_memsz overflow during header update
        drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS)
        Documentation/: update 00-INDEX files
        checkpatch: fix detection of git repository
        get_maintainer: fix detection of git repository
        drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context()
      6792dfe3
    • Xue jiufei's avatar
      ocfs2: check existence of old dentry in ocfs2_link() · 0e048316
      Xue jiufei authored
      System call linkat first calls user_path_at(), check the existence of
      old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
      work.  There may exist a race when Node A create a hard link for file
      while node B rm it.
      
               Node A                          Node B
      user_path_at()
        ->ocfs2_lookup(),
      find old dentry exist
                                      rm file, add inode say inodeA
                                      to orphan_dir
      
      call ocfs2_link(),create a
      hard link for inodeA.
      
                                      rm the link, add inodeA to orphan_dir
                                      again
      
      When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
      main work.  It first tranverses entrys in orphan_dir, linking all inodes
      in this orphan_dir to a list look like this:
      
      	inodeA->inodeB->...->inodeA
      
      When tranvering this list, it will fall into loop, calling iput() again
      and again.  And finally trigger BUG_ON(inode->i_state & I_CLEAR).
      Signed-off-by: default avatarjoyce <xuejiufei@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e048316
    • Junxiao Bi's avatar
      ocfs2: update inode size after zeroing the hole · c7d2cbc3
      Junxiao Bi authored
      fs-writeback will release the dirty pages without page lock whose offset
      are over inode size, the release happens at
      block_write_full_page_endio().  If not update, dirty pages in file holes
      may be released before flushed to the disk, then file holes will contain
      some non-zero data, this will cause sparse file md5sum error.
      
      To reproduce the bug, find a big sparse file with many holes, like vm
      image file, its actual size should be bigger than available mem size to
      make writeback work more frequently, tar it with -S option, then keep
      untar it and check its md5sum again and again until you get a wrong
      md5sum.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Younger Liu <younger.liu@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7d2cbc3
    • Younger Liu's avatar
      ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size · d62e74be
      Younger Liu authored
      The issue scenario is as following:
      
      - Create a small file and fallocate a large disk space for a file with
        FALLOC_FL_KEEP_SIZE option.
      
      - ftruncate the file back to the original size again.  but the disk free
        space is not changed back.  This is a real bug that be fixed in this
        patch.
      
      In order to solve the issue above, we modified ocfs2_setattr(), if
      attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
      truncate disk space to attr->ia_size.
      Signed-off-by: default avatarYounger Liu <younger.liu@huawei.com>
      Reviewed-by: default avatarJie Liu <jeff.liu@oracle.com>
      Tested-by: default avatarJie Liu <jeff.liu@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: default avatarJensen <shencanquan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d62e74be
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED · 8d547ff4
      Naoya Horiguchi authored
      mce-test detected a test failure when injecting error to a thp tail
      page.  This is because we take page refcount of the tail page in
      madvise_hwpoison() while the fix in commit a3e0f9e4
      ("mm/memory-failure.c: transfer page count from head page to tail page
      after split thp") assumes that we always take refcount on the head page.
      
      When a real memory error happens we take refcount on the head page where
      memory_failure() is called without MF_COUNT_INCREASED set, so it seems
      to me that testing memory error on thp tail page using madvise makes
      little sense.
      
      This patch cancels moving refcount in !MF_COUNT_INCREASED for valid
      testing.
      
      [akpm@linux-foundation.org: s/&&/&/]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Chen Gong <gong.chen@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[3.9+: a3e0f9e4]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d547ff4
    • Paul Gortmaker's avatar
      smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls · fb37bb04
      Paul Gortmaker authored
      Use what we already do for arch_disable_smp_support() to fix these:
      
        arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
        kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb37bb04
    • Rafael Aquini's avatar
      mm: fix page leak at nfs_symlink() · a0b54add
      Rafael Aquini authored
      Changes in commit a0b8cab3 ("mm: remove lru parameter from
      __pagevec_lru_add and remove parts of pagevec API") have introduced a
      call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as
      now the page gets an extra refcount that is not dropped.
      
      Jan Stancek observed and reported the leak effect while running test8
      from Connectathon Testsuite.  After several iterations over the test
      case, which creates several symlinks on a NFS mountpoint, the test
      system was quickly getting into an out-of-memory scenario.
      
      This patch fixes the page leak by dropping that extra refcount
      add_to_page_cache_lru() is grabbing.
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: <stable@vger.kernel.org>	[3.11.x+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0b54add
    • Steven Rostedt's avatar
      slub: do not assert not having lock in removing freed partial · 1e4dd946
      Steven Rostedt authored
      Vladimir reported the following issue:
      
      Commit c65c1877 ("slub: use lockdep_assert_held") requires
      remove_partial() to be called with n->list_lock held, but free_partial()
      called from kmem_cache_close() on cache destruction does not follow this
      rule, leading to a warning:
      
        WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
        Modules linked in:
        CPU: 0 PID: 2787 Comm: modprobe Tainted: G        W    3.14.0-rc1-mm1+ #1
        Hardware name:
         0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
         0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
         ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
        Call Trace:
          __kmem_cache_shutdown+0x1b2/0x1f0
          kmem_cache_destroy+0x43/0xf0
          xfs_destroy_zones+0x103/0x110 [xfs]
          exit_xfs_fs+0x38/0x4e4 [xfs]
          SyS_delete_module+0x19a/0x1f0
          system_call_fastpath+0x16/0x1b
      
      His solution was to add a spinlock in order to quiet lockdep.  Although
      there would be no contention to adding the lock, that lock also requires
      disabling of interrupts which will have a larger impact on the system.
      
      Instead of adding a spinlock to a location where it is not needed for
      lockdep, make a __remove_partial() function that does not test if the
      list_lock is held, as no one should have it due to it being freed.
      
      Also added a __add_partial() function that does not do the lock
      validation either, as it is not needed for the creation of the cache.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reported-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e4dd946
    • Borislav Petkov's avatar
      gitignore: add all.config · 25fba9be
      Borislav Petkov authored
      This is used by kbuild to load preset Kconfig options.  We need to
      ignore it, otherwise git clean kills it.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25fba9be
    • Younger Liu's avatar
      ocfs2: fix ocfs2_sync_file() if filesystem is readonly · a987c7ca
      Younger Liu authored
      If filesystem is readonly, there is no need to flush drive's caches or
      force any uncommitted transactions.
      
      [akpm@linux-foundation.org: return -EROFS, not 0]
      Signed-off-by: default avatarYounger Liu <younger.liucn@gmail.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a987c7ca
    • Prarit Bhargava's avatar
      drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero · 79040cad
      Prarit Bhargava authored
      If you do
      
        echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec
      
      the following stack trace is output because the edac module is not
      designed to poll with a timeout of zero.
      
        WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
        list_add corruption. prev->next should be next (ffff8808291dd1b8), but was           (null). (prev=ffff8808286fe3f8).
        Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
        CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
        Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
        Call Trace:
         <IRQ>
          __list_add+0xac/0xc0
          __internal_add_timer+0xab/0x130
          internal_add_timer+0x17/0x40
          mod_timer_pinned+0xca/0x170
          intel_pstate_timer_func+0x28a/0x380
          call_timer_fn+0x36/0x100
          run_timer_softirq+0x1ff/0x2f0
          __do_softirq+0xf5/0x2e0
          irq_exit+0x10d/0x120
          smp_apic_timer_interrupt+0x45/0x60
          apic_timer_interrupt+0x6d/0x80
         <EOI>
          cpuidle_idle_call+0xb9/0x1f0
          arch_cpu_idle+0xe/0x30
          cpu_startup_entry+0x9e/0x240
          start_secondary+0x1e4/0x290
      
        kernel BUG at kernel/timer.c:1084!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
        CPU: 12 PID: 0 Comm: swapper/12 Tainted: G        W    3.13.0+ #1
        Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
        Call Trace:
         <IRQ>
          run_timer_softirq+0x245/0x2f0
          __do_softirq+0xf5/0x2e0
          irq_exit+0x10d/0x120
          smp_apic_timer_interrupt+0x45/0x60
          apic_timer_interrupt+0x6d/0x80
         <EOI>
          cpuidle_idle_call+0xb9/0x1f0
          arch_cpu_idle+0xe/0x30
          cpu_startup_entry+0x9e/0x240
          start_secondary+0x1e4/0x290
        RIP   cascade+0x93/0xa0
      
        WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
        Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
        CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G        W    3.13.0+ #1
        Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
        Workqueue: edac-poller edac_mc_workq_function [edac_core]
        Call Trace:
          dump_stack+0x45/0x56
          warn_slowpath_common+0x7d/0xa0
          warn_slowpath_null+0x1a/0x20
          __queue_delayed_work+0xed/0x1a0
          queue_delayed_work_on+0x27/0x50
          edac_mc_workq_function+0x72/0xa0 [edac_core]
          process_one_work+0x17b/0x460
          worker_thread+0x11b/0x400
          kthread+0xd2/0xf0
          ret_from_fork+0x7c/0xb0
      
      This patch adds a range check in the edac_mc_poll_msec code to check for 0.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79040cad
    • Eric W. Biederman's avatar
      fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem · 96c7a2ff
      Eric W. Biederman authored
      Recently due to a spike in connections per second memcached on 3
      separate boxes triggered the OOM killer from accept.  At the time the
      OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
      problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
      hold a bitmap, and there was sufficient fragmentation that the largest
      page available was 8KiB.
      
      I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
      but I do agree that order 3 allocations are very likely to succeed.
      
      There are always pathologies where order > 0 allocations can fail when
      there are copious amounts of free memory available.  Using the pigeon
      hole principle it is easy to show that it requires 1 page more than 50%
      of the pages being free to guarantee an order 1 (8KiB) allocation will
      succeed, 1 page more than 75% of the pages being free to guarantee an
      order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
      the pages being free to guarantee an order 3 allocate will succeed.
      
      A server churning memory with a lot of small requests and replies like
      memcached is a common case that if anything can will skew the odds
      against large pages being available.
      
      Therefore let's not give external applications a practical way to kill
      linux server applications, and specify __GFP_NORETRY to the kmalloc in
      alloc_fdmem.  Unless I am misreading the code and by the time the code
      reaches should_alloc_retry in __alloc_pages_slowpath (where
      __GFP_NORETRY becomes signification).  We have already tried everything
      reasonable to allocate a page and the only thing left to do is wait.  So
      not waiting and falling back to vmalloc immediately seems like the
      reasonable thing to do even if there wasn't a chance of triggering the
      OOM killer.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96c7a2ff
    • Mel Gorman's avatar
      xen: properly account for _PAGE_NUMA during xen pte translations · a9c8e4be
      Mel Gorman authored
      Steven Noonan forwarded a users report where they had a problem starting
      vsftpd on a Xen paravirtualized guest, with this in dmesg:
      
        BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
        page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
        page flags: 0x2ffc0000000014(referenced|dirty)
        addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
        CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
        Call Trace:
          dump_stack+0x45/0x56
          print_bad_pte+0x22e/0x250
          unmap_single_vma+0x583/0x890
          unmap_vmas+0x65/0x90
          exit_mmap+0xc5/0x170
          mmput+0x65/0x100
          do_exit+0x393/0x9e0
          do_group_exit+0xcc/0x140
          SyS_exit_group+0x14/0x20
          system_call_fastpath+0x1a/0x1f
        Disabling lock debugging due to kernel taint
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
      
      The issue could not be reproduced under an HVM instance with the same
      kernel, so it appears to be exclusive to paravirtual Xen guests.  He
      bisected the problem to commit 1667918b ("mm: numa: clear numa
      hinting information on mprotect") that was also included in 3.12-stable.
      
      The problem was related to how xen translates ptes because it was not
      accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
      a pteval_present helper for use by xen so both bare metal and xen use
      the same code when checking if a PTE is present.
      
      [mgorman@suse.de: wrote changelog, proposed minor modifications]
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Tested-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: default avatarElena Ufimtseva <ufimtseva@gmail.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9c8e4be
    • David Rientjes's avatar
      mm/slub.c: list_lock may not be held in some circumstances · 255d0884
      David Rientjes authored
      Commit c65c1877 ("slub: use lockdep_assert_held") incorrectly
      required that add_full() and remove_full() hold n->list_lock.  The lock
      is only taken when kmem_cache_debug(s), since that's the only time it
      actually does anything.
      
      Require that the lock only be taken under such a condition.
      Reported-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      255d0884
    • Geert Uytterhoeven's avatar
      drivers/md/bcache/extents.c: use %zi to format size_t · bd180b4e
      Geert Uytterhoeven authored
        drivers/md/bcache/extents.c: In function `btree_ptr_bad_expensive':
        drivers/md/bcache/extents.c:196: warning: format `%li' expects type `long int', but argument 4 has type `size_t'
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd180b4e