1. 09 Apr, 2014 1 commit
  2. 08 Apr, 2014 14 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ce7613db
      Linus Torvalds authored
      Pull more networking updates from David Miller:
      
       1) If a VXLAN interface is created with no groups, we can crash on
          reception of packets.  Fix from Mike Rapoport.
      
       2) Missing includes in CPTS driver, from Alexei Starovoitov.
      
       3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
          and Dan Carpenter.
      
       4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers.  From
          Josh Boyer.
      
       5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
          Borkmann.
      
       6) Byte-Queue-Limit enabled drivers aren't handled properly in
          AF_PACKET transmit path, also from Daniel Borkmann.
      
          Same problem exists in pktgen, and Daniel fixed it there too.
      
       7) Fix resource leaks in driver probe error paths of new sxgbe driver,
          from Francois Romieu.
      
       8) Truesize of SKBs can gradually get more and more corrupted in NAPI
          packet recycling path, fix from Eric Dumazet.
      
       9) Fix uniprocessor netfilter build, from Florian Westphal.  In the
          longer term we should perhaps try to find a way for ARRAY_SIZE() to
          work even with zero sized array elements.
      
      10) Fix crash in netfilter conntrack extensions due to mis-estimation of
          required extension space.  From Andrey Vagin.
      
      11) Since we commit table rule updates before trying to copy the
          counters back to userspace (it's the last action we perform), we
          really can't signal the user copy with an error as we are beyond the
          point from which we can unwind everything.  This causes all kinds of
          use after free crashes and other mysterious behavior.
      
          From Thomas Graf.
      
      12) Restore previous behvaior of div/mod by zero in BPF filter
          processing.  From Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
        net: sctp: wake up all assocs if sndbuf policy is per socket
        isdnloop: several buffer overflows
        netdev: remove potentially harmful checks
        pktgen: fix xmit test for BQL enabled devices
        net/at91_ether: avoid NULL pointer dereference
        tipc: Let tipc_release() return 0
        at86rf230: fix MAX_CSMA_RETRIES parameter
        mac802154: fix duplicate #include headers
        sxgbe: fix duplicate #include headers
        net: filter: be more defensive on div/mod by X==0
        netfilter: Can't fail and free after table replacement
        xen-netback: Trivial format string fix
        net: bcmgenet: Remove unnecessary version.h inclusion
        net: smc911x: Remove unused local variable
        bonding: Inactive slaves should keep inactive flag's value
        netfilter: nf_tables: fix wrong format in request_module()
        netfilter: nf_tables: set names cannot be larger than 15 bytes
        netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
        netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
        netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
        ...
      ce7613db
    • Linus Torvalds's avatar
      Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 0afccc4c
      Linus Torvalds authored
      Pull more staging patches from Greg KH:
       "Here are some more staging patches for 3.15-rc1.
      
        They include a late-submission of a wireless driver that a bunch of
        people seem to have the hardware for now.  As it's stand-alone, it
        should be fine (now passes the 0-day random build bot tests).
      
        There are also some fixes for the unisys drivers, as they were causing
        havoc on a number of different machines.  To resolve all of those
        issues, we just mark the driver as BROKEN now, and we can fix it up
        "properly" over time"
      
      * tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: rtl8723au: The 8723 only has two paths
        Staging: unisys: mark drivers as BROKEN
        Staging: unisys: verify that a control channel exists
        staging: unisys: Add missing close parentheses in filexfer.c
        staging: r8723au: Fix build problem when RFKILL is not selected
        staging: r8723au: Fix randconfig build errors
        staging: r8723au: Turn on build of new driver
        staging: r8723au: Additional source patches
        staging: r8723au: Add source files for new driver - part 4
        staging: r8723au: Add source files for new driver - part 3
        staging: r8723au: Add source files for new driver - part 2
        staging: r8723au: Add source files for new driver - part 1
      0afccc4c
    • Linus Torvalds's avatar
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · e4f30545
      Linus Torvalds authored
      Pull second set of arm64 updates from Catalin Marinas:
       "A second pull request for this merging window, mainly with fixes and
        docs clarification:
      
         - Documentation clarification on CPU topology and booting
           requirements
         - Additional cache flushing during boot (needed in the presence of
           external caches or under virtualisation)
         - DMA range invalidation fix for non cache line aligned buffers
         - Build failure fix with !COMPAT
         - Kconfig update for STRICT_DEVMEM"
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Fix DMA range invalidation for cache line unaligned buffers
        arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
        arm64: fix !CONFIG_COMPAT build failures
        Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
        arm64: Relax the kernel cache requirements for boot
        arm64: Update the TCR_EL1 translation granule definitions for 16K pages
        ARM: topology: Make it clear that all CPUs need to be described
      e4f30545
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d586c86d
      Linus Torvalds authored
      Pull second set of s390 patches from Martin Schwidefsky:
       "The second part of Heikos uaccess rework, the page table walker for
        uaccess is now a thing of the past (yay!)
      
        The code change to fix the theoretical TLB flush problem allows us to
        add a TLB flush optimization for zEC12, this machine has new
        instructions that allow to do CPU local TLB flushes for single pages
        and for all pages of a specific address space.
      
        Plus the usual bug fixing and some more cleanup"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/uaccess: rework uaccess code - fix locking issues
        s390/mm,tlb: optimize TLB flushing for zEC12
        s390/mm,tlb: safeguard against speculative TLB creation
        s390/irq: Use defines for external interruption codes
        s390/irq: Add defines for external interruption codes
        s390/sclp: add timeout for queued requests
        kvm/s390: also set guest pages back to stable on kexec/kdump
        lcs: Add missing destroy_timer_on_stack()
        s390/tape: Add missing destroy_timer_on_stack()
        s390/tape: Use del_timer_sync()
        s390/3270: fix crash with multiple reset device requests
        s390/bitops,atomic: add missing memory barriers
        s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
      d586c86d
    • Daniel Borkmann's avatar
      net: sctp: wake up all assocs if sndbuf policy is per socket · 52c35bef
      Daniel Borkmann authored
      SCTP charges chunks for wmem accounting via skb->truesize in
      sctp_set_owner_w(), and sctp_wfree() respectively as the
      reverse operation. If a sender runs out of wmem, it needs to
      wait via sctp_wait_for_sndbuf(), and gets woken up by a call
      to __sctp_write_space() mostly via sctp_wfree().
      
      __sctp_write_space() is being called per association. Although
      we assign sk->sk_write_space() to sctp_write_space(), which
      is then being done per socket, it is only used if send space
      is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
      is set and therefore not invoked in sock_wfree().
      
      Commit 4c3a5bda ("sctp: Don't charge for data in sndbuf
      again when transmitting packet") fixed an issue where in case
      sctp_packet_transmit() manages to queue up more than sndbuf
      bytes, sctp_wait_for_sndbuf() will never be woken up again
      unless it is interrupted by a signal. However, a still
      remaining issue is that if net.sctp.sndbuf_policy=0, that is
      accounting per socket, and one-to-many sockets are in use,
      the reclaimed write space from sctp_wfree() is 'unfairly'
      handed back on the server to the association that is the lucky
      one to be woken up again via __sctp_write_space(), while
      the remaining associations are never be woken up again
      (unless by a signal).
      
      The effect disappears with net.sctp.sndbuf_policy=1, that
      is wmem accounting per association, as it guarantees a fair
      share of wmem among associations.
      
      Therefore, if we have reclaimed memory in case of per socket
      accounting, wake all related associations to a socket in a
      fair manner, that is, traverse the socket association list
      starting from the current neighbour of the association and
      issue a __sctp_write_space() to everyone until we end up
      waking ourselves. This guarantees that no association is
      preferred over another and even if more associations are
      taken into the one-to-many session, all receivers will get
      messages from the server and are not stalled forever on
      high load. This setting still leaves the advantage of per
      socket accounting in touch as an association can still use
      up global limits if unused by others.
      
      Fixes: 4eb701df ("[SCTP] Fix SCTP sendbuffer accouting.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52c35bef
    • Linus Torvalds's avatar
      Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux · e9f37d3a
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "Highlights:
      
         - drm:
      
           Generic display port aux features, primary plane support, drm
           master management fixes, logging cleanups, enforced locking checks
           (instead of docs), documentation improvements, minor number
           handling cleanup, pseudofs for shared inodes.
      
         - ttm:
      
           add ability to allocate from both ends
      
         - i915:
      
           broadwell features, power domain and runtime pm, per-process
           address space infrastructure (not enabled)
      
         - msm:
      
           power management, hdmi audio support
      
         - nouveau:
      
           ongoing GPU fault recovery, initial maxwell support, random fixes
      
         - exynos:
      
           refactored driver to clean up a lot of abstraction, DP support
           moved into drm, LVDS bridge support added, parallel panel support
      
         - gma500:
      
           SGX MMU support, SGX irq handling, asle irq work fixes
      
         - radeon:
      
           video engine bringup, ring handling fixes, use dp aux helpers
      
         - vmwgfx:
      
           add rendernode support"
      
      * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
        DRM: armada: fix corruption while loading cursors
        drm/dp_helper: don't return EPROTO for defers (v2)
        drm/bridge: export ptn3460_init function
        drm/exynos: remove MODULE_DEVICE_TABLE definitions
        ARM: dts: exynos4412-trats2: enable exynos/fimd node
        ARM: dts: exynos4210-trats: enable exynos/fimd node
        ARM: dts: exynos4412-trats2: add panel node
        ARM: dts: exynos4210-trats: add panel node
        ARM: dts: exynos4: add MIPI DSI Master node
        drm/panel: add S6E8AA0 driver
        ARM: dts: exynos4210-universal_c210: add proper panel node
        drm/panel: add ld9040 driver
        panel/ld9040: add DT bindings
        panel/s6e8aa0: add DT bindings
        drm/exynos: add DSIM driver
        exynos/dsim: add DT bindings
        drm/exynos: disallow fbdev initialization if no device is connected
        drm/mipi_dsi: create dsi devices only for nodes with reg property
        drm/mipi_dsi: add flags to DSI messages
        Skip intel_crt_init for Dell XPS 8700
        ...
      e9f37d3a
    • Dan Carpenter's avatar
      isdnloop: several buffer overflows · 7563487c
      Dan Carpenter authored
      There are three buffer overflows addressed in this patch.
      
      1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
      then copy it into a 60 character buffer.  I have made the destination
      buffer 64 characters and I'm changed the sprintf() to a snprintf().
      
      2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
      character buffer so we have 54 characters.  The ->eazlist[] is 11
      characters long.  I have modified the code to return if the source
      buffer is too long.
      
      3) In isdnloop_command() the cbuf[] array was 60 characters long but the
      max length of the string then can be up to 79 characters.  I made the
      cbuf array 80 characters long and changed the sprintf() to snprintf().
      I also removed the temporary "dial" buffer and changed it to use "p"
      directly.
      
      Unfortunately, we pass the "cbuf" string from isdnloop_command() to
      isdnloop_writecmd() which truncates anything over 60 characters to make
      it fit in card->omsg[].  (It can accept values up to 255 characters so
      long as there is a '\n' character every 60 characters).  For now I have
      just fixed the memory corruption bug and left the other problems in this
      driver alone.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7563487c
    • Heiko Carstens's avatar
    • Catalin Marinas's avatar
      arm64: Fix DMA range invalidation for cache line unaligned buffers · ebf81a93
      Catalin Marinas authored
      If the buffer needing cache invalidation for inbound DMA does start or
      end on a cache line aligned address, we need to use the non-destructive
      clean&invalidate operation. This issue was introduced by commit
      7363590d (arm64: Implement coherent DMA API based on swiotlb).
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarJon Medhurst (Tixy) <tixy@linaro.org>
      ebf81a93
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · a7963eb7
      Linus Torvalds authored
      Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
       "various cleanups for ext2, ext3, udf, isofs, a documentation update
        for quota, and a fix of a race in reiserfs readdir implementation"
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        reiserfs: fix race in readdir
        ext2: acl: remove unneeded include of linux/capability.h
        ext3: explicitly remove inode from orphan list after failed direct io
        fs/isofs/inode.c add __init to init_inodecache()
        ext3: Speedup WB_SYNC_ALL pass
        fs/quota/Kconfig: Update filesystems
        ext3: Update outdated comment before ext3_ordered_writepage()
        ext3: Update PF_MEMALLOC handling in ext3_write_inode()
        ext2/3: use prandom_u32() instead of get_random_bytes()
        ext3: remove an unneeded check in ext3_new_blocks()
        ext3: remove unneeded check in ext3_ordered_writepage()
        fs: Mark function as static in ext3/xattr_security.c
        fs: Mark function as static in ext3/dir.c
        fs: Mark function as static in ext2/xattr_security.c
        ext3: Add __init macro to init_inodecache
        ext2: Add __init macro to init_inodecache
        udf: Add __init macro to init_inodecache
        fs: udf: parse_options: blocksize check
      a7963eb7
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · b003d770
      Linus Torvalds authored
      Pull kbuild changes from Michal Marek:
       - cleanups in the main Makefiles and Documentation/DocBook/Makefile
       - make O=...  directory is automatically created if needed
       - mrproper/distclean removes the old include/linux/version.h to make
         life easier when bisecting across the commit that moved the version.h
         file
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: docbook: fix the include error when executing "make help"
        kbuild: create a build directory automatically for out-of-tree build
        kbuild: remove redundant '.*.cmd' pattern from make distclean
        kbuild: move "quote" to Kbuild.include to be consistent
        kbuild: docbook: use $(obj) and $(src) rather than specific path
        kbuild: unconditionally clobber include/linux/version.h on distclean
        kbuild: docbook: specify KERNELDOC dependency correctly
        kbuild: docbook: include cmd files more simply
        kbuild: specify build_docproc as a phony target
      b003d770
    • Linus Torvalds's avatar
      Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 3573d386
      Linus Torvalds authored
      Pull ARC changes from Vineet Gupta:
       - Support for external initrd from Noam
       - Fix broken serial console in nsimosci Virtual Platform
       - Reuse of ENTRY/END assembler macros across hand asm code
       - Other minor fixes here and there
      
      * tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: [nsimosci] Unbork console
        ARC: [nsimosci] Change .dts to use generic 8250 UART
        ARC: [SMP] General Fixes
        ARC: Remove unused DT template file
        ARC: [clockevent] simplify timer ISR
        ARC: [clockevent] can't be SoC specific
        ARC: Remove ARC_HAS_COH_RTSC
        ARC: switch to generic ENTRY/END assembler annotations
        ARC: support external initrd
        ARC: add uImage to .gitignore
        ARC: [arcfpga] Fix __initconst data const-correctness
      3573d386
    • Russell King's avatar
      DRM: armada: fix corruption while loading cursors · c39b0695
      Russell King authored
      Loading cursors to the LCD controller's SRAM can be corrupted when the
      configured pixel clock is relatively slow.  This seems to be caused
      when we write back-to-back to the SRAM registers.
      
      There doesn't appear to be any status register we can read to check
      when an access has completed.
      
      Inserting a dummy read between the writes appears to fix the problem.
      
      Cc: <stable@vger.kernel.org> # 3.13
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      c39b0695
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · c8d9762a
      Linus Torvalds authored
      Pull Xen build fix from David Vrabel:
       "Fix arm build of drivers/xen/events/
      
        The merge of irq-core-for-linus branch broke it"
      
      * tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        Xen: do hv callback accounting only on x86
      c8d9762a
  3. 07 Apr, 2014 25 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · 26c12d93
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
       - the rest of MM
       - zram updates
       - zswap updates
       - exit
       - procfs
       - exec
       - wait
       - crash dump
       - lib/idr
       - rapidio
       - adfs, affs, bfs, ufs
       - cris
       - Kconfig things
       - initramfs
       - small amount of IPC material
       - percpu enhancements
       - early ioremap support
       - various other misc things
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
        MAINTAINERS: update Intel C600 SAS driver maintainers
        fs/ufs: remove unused ufs_super_block_third pointer
        fs/ufs: remove unused ufs_super_block_second pointer
        fs/ufs: remove unused ufs_super_block_first pointer
        fs/ufs/super.c: add __init to init_inodecache()
        doc/kernel-parameters.txt: add early_ioremap_debug
        arm64: add early_ioremap support
        arm64: initialize pgprot info earlier in boot
        x86: use generic early_ioremap
        mm: create generic early_ioremap() support
        x86/mm: sparse warning fix for early_memremap
        lglock: map to spinlock when !CONFIG_SMP
        percpu: add preemption checks to __this_cpu ops
        vmstat: use raw_cpu_ops to avoid false positives on preemption checks
        slub: use raw_cpu_inc for incrementing statistics
        net: replace __this_cpu_inc in route.c with raw_cpu_inc
        modules: use raw_cpu_write for initialization of per cpu refcount.
        mm: use raw_cpu ops for determining current NUMA node
        percpu: add raw_cpu_ops
        slub: fix leak of 'name' in sysfs_slab_add
        ...
      26c12d93
    • Lukasz Dorau's avatar
      MAINTAINERS: update Intel C600 SAS driver maintainers · fdc5813f
      Lukasz Dorau authored
      Signed-off-by: default avatarLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarMaciej Patelczyk <maciej.patelczyk@intel.com>
      Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fdc5813f
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_third pointer · fe4487d1
      Christian Engelmayer authored
      Pointer 'usb3' to struct ufs_super_block_third acquired via
      ubh_get_usb_third() is never used in function
      ufs_read_cylinder_structures().  Thus remove it.
      
      Detected by Coverity: CID 139939.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe4487d1
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_second pointer · 48968a11
      Christian Engelmayer authored
      Pointer 'usb2' to struct ufs_super_block_second acquired via
      ubh_get_usb_second() is never used in function ufs_statfs().  Thus
      remove it.
      
      Detected by Coverity: CID 139940.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48968a11
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_first pointer · 6e0bd34c
      Christian Engelmayer authored
      Remove occurences of unused pointers to struct ufs_super_block_first
      that were acquired via ubh_get_usb_first().
      
      Detected by Coverity: CID 139929 - CID 139936, CID 139940.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e0bd34c
    • Fabian Frederick's avatar
      fs/ufs/super.c: add __init to init_inodecache() · 76ee4735
      Fabian Frederick authored
      init_inodecache is only called by __init init_ufs_fs.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76ee4735
    • Mark Salter's avatar
      doc/kernel-parameters.txt: add early_ioremap_debug · 56aeeba8
      Mark Salter authored
      Add description of early_ioremap_debug kernel parameter.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56aeeba8
    • Mark Salter's avatar
      arm64: add early_ioremap support · bf4b558e
      Mark Salter authored
      Add support for early IO or memory mappings which are needed before the
      normal ioremap() is usable.  This also adds fixmap support for permanent
      fixed mappings such as that used by the earlyprintk device register
      region.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf4b558e
    • Mark Salter's avatar
      arm64: initialize pgprot info earlier in boot · 0bf757c7
      Mark Salter authored
      Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
      values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.
      
      The new fixmap and early_ioremap support also needs to use these macros
      before paging_init() is called.  This patch moves the init_mem_pgprot()
      call out of paging_init() and into setup_arch() so that pgprot_default
      gets initialized in time for fixmap and early_ioremap.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bf757c7
    • Mark Salter's avatar
      x86: use generic early_ioremap · 5b7c73e0
      Mark Salter authored
      Move x86 over to the generic early ioremap implementation.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b7c73e0
    • Mark Salter's avatar
      mm: create generic early_ioremap() support · 9e5c33d7
      Mark Salter authored
      This patch creates a generic implementation of early_ioremap() support
      based on the existing x86 implementation.  early_ioremp() is useful for
      early boot code which needs to temporarily map I/O or memory regions
      before normal mapping functions such as ioremap() are available.
      
      Some architectures have optional MMU.  In the no-MMU case, the remap
      functions simply return the passed in physical address and the unmap
      functions do nothing.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e5c33d7
    • Dave Young's avatar
      x86/mm: sparse warning fix for early_memremap · 6b550f6f
      Dave Young authored
      This patch series takes the common bits from the x86 early ioremap
      implementation and creates a generic implementation which may be used by
      other architectures.  The early ioremap interfaces are intended for
      situations where boot code needs to make temporary virtual mappings
      before the normal ioremap interfaces are available.  Typically, this
      means before paging_init() has run.
      
      This patch (of 6):
      
      There's a lot of sparse warnings for code like below: void *a =
      early_memremap(phys_addr, size);
      
      early_memremap intend to map kernel memory with ioremap facility, the
      return pointer should be a kernel ram pointer instead of iomem one.
      
      For making the function clearer and supressing sparse warnings this patch
      do below two things:
      1. cast to (__force void *) for the return value of early_memremap
      2. add early_memunmap function and pass (__force void __iomem *) to iounmap
      
      From Boris:
        "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
         FWIW, all callers of early_memremap use the memory they get remapped
         as normal memory so we should be safe"
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b550f6f
    • Josh Triplett's avatar
      lglock: map to spinlock when !CONFIG_SMP · 64b47e8f
      Josh Triplett authored
      When the system has only one CPU, lglock is effectively a spinlock; map
      it directly to spinlock to eliminate the indirection and duplicate code.
      
      In addition to removing overhead, this drops 1.6k of code with a
      defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b47e8f
    • Christoph Lameter's avatar
      percpu: add preemption checks to __this_cpu ops · 188a8140
      Christoph Lameter authored
      We define a check function in order to avoid trouble with the include
      files.  Then the higher level __this_cpu macros are modified to invoke
      the preemption check.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      188a8140
    • Christoph Lameter's avatar
      vmstat: use raw_cpu_ops to avoid false positives on preemption checks · 293b6a4c
      Christoph Lameter authored
      vm counters are allowed to be racy.  Use raw_cpu_ops to avoid the
      local_irq_disable overhead and to avoid preemption checks which will be
      added to the __this_cpu operations.
      
      [akpm@linux-foundation.org: Add comment.  Again.]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      293b6a4c
    • Christoph Lameter's avatar
      slub: use raw_cpu_inc for incrementing statistics · 88da03a6
      Christoph Lameter authored
      Statistics are not critical to the operation of the allocation but
      should also not cause too much overhead.
      
      When __this_cpu_inc is altered to check if preemption is disabled this
      triggers.  Use raw_cpu_inc to avoid the checks.  Using this_cpu_ops may
      cause interrupt disable/enable sequences on various arches which may
      significantly impact allocator performance.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88da03a6
    • Christoph Lameter's avatar
      net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91
      Christoph Lameter authored
      The RT_CACHE_STAT_INC macro triggers the new preemption checks
      for __this_cpu ops.
      
      I do not see any other synchronization that would allow the use of a
      __this_cpu operation here however in commit dbd2915c ("[IPV4]:
      RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
      raw_smp_processor_id() here because "we do not care" about races.  In
      the past we agreed that the price of disabling interrupts here to get
      consistent counters would be too high.  These counters may be inaccurate
      due to race conditions.
      
      The use of __this_cpu op improves the situation already from what commit
      dbd2915c did since the single instruction emitted on x86 does not
      allow the race to occur anymore.  However, non x86 platforms could still
      experience a race here.
      
      Trace:
      
        __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          __ip_route_output_key+0x575/0x8c0
          ip_route_output_flow+0x27/0x70
          udp_sendmsg+0x825/0xa20
          inet_sendmsg+0x85/0xc0
          sock_sendmsg+0x9c/0xd0
          ___sys_sendmsg+0x37c/0x390
          __sys_sendmsg+0x49/0x90
          SyS_sendmsg+0x12/0x20
          tracesys+0xe1/0xe6
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ed66e91
    • Christoph Lameter's avatar
      modules: use raw_cpu_write for initialization of per cpu refcount. · 08f141d3
      Christoph Lameter authored
      The initialization of a structure is not subject to synchronization.
      The use of __this_cpu would trigger a false positive with the additional
      preemption checks for __this_cpu ops.
      
      So simply disable the check through the use of raw_cpu ops.
      
      Trace:
      
        __this_cpu_write operation in preemptible [00000000] code: modprobe/286
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          dump_stack+0x4e/0x82
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          load_module+0xcfd/0x2650
          SyS_init_module+0xa6/0xd0
          tracesys+0xe1/0xe6
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08f141d3
    • Christoph Lameter's avatar
      mm: use raw_cpu ops for determining current NUMA node · dc322a99
      Christoph Lameter authored
      With the preempt checking logic for __this_cpu_ops we will get false
      positives from locations in the code that use numa_node_id.
      
      Before the __this_cpu ops where introduced there were no checks for
      preemption present either.  smp_raw_processor_id() was used.  See
      
        http://www.spinics.net/lists/linux-numa/msg00641.html
      
      Therefore we need to use raw_cpu_read here to avoid false postives.
      
      Note that this issue has been discussed in prior years.  If the process
      changes nodes after retrieving the current numa node then that is
      acceptable since most uses of numa_node etc are for optimization and not
      for correctness.
      
      There were suggestions to implement a raw_numa_node_id in order to do
      preempt checks for numa_node_id as well.  But I think we better defer
      that to another patch since that would mean investigating how
      numa_node_id() is used throughout the kernel which would increase the
      scope of this patchset significantly.  After all preemption was never
      checked before when numa_node_id() was used.
      
      Some sample traces:
      
      __this_cpu_read operation in preemptible [00000000] code: login/1456
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        get_task_policy+0x1d/0x49
        get_vma_policy+0x14/0x76
        alloc_pages_vma+0x35/0xff
        handle_mm_fault+0x290/0x73b
        __do_page_fault+0x3fe/0x44d
        do_page_fault+0x9/0xc
        page_fault+0x22/0x30
        generic_file_aio_read+0x38e/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        alloc_pages_current+0x8f/0xbc
        __page_cache_alloc+0xb/0xd
        __do_page_cache_readahead+0xf4/0x219
        ra_submit+0x1c/0x20
        ondemand_readahead+0x28c/0x2b4
        page_cache_sync_readahead+0x38/0x3a
        generic_file_aio_read+0x261/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc322a99
    • Christoph Lameter's avatar
      percpu: add raw_cpu_ops · b3ca1c10
      Christoph Lameter authored
      The kernel has never been audited to ensure that this_cpu operations are
      consistently used throughout the kernel.  The code generated in many
      places can be improved through the use of this_cpu operations (which
      uses a segment register for relocation of per cpu offsets instead of
      performing address calculations).
      
      The patch set also addresses various consistency issues in general with
      the per cpu macros.
      
      A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
         because checks are skipped. This is typically shown through a raw_
         prefix. So this patch set changes the places where __this_cpu_ptr()
         is used to raw_cpu_ptr().
      
      B. There has been the long term wish by some that __this_cpu operations
         would check for preemption. However, there are cases where preemption
         checks need to be skipped. This patch set adds raw_cpu operations that
         do not check for preemption and then adds preemption checks to the
         __this_cpu operations.
      
      C. The use of __get_cpu_var is always a reference to a percpu variable
         that can also be handled via a this_cpu operation. This patch set
         replaces all uses of __get_cpu_var with this_cpu operations.
      
      D. We can then use this_cpu RMW operations in various places replacing
         sequences of instructions by a single one.
      
      E. The use of this_cpu operations throughout will allow other arches than
         x86 to implement optimized references and RMV operations to work with
         per cpu local data.
      
      F. The use of this_cpu operations opens up the possibility to
         further optimize code that relies on synchronization through
         per cpu data.
      
      The patch set works in a couple of stages:
      
      I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
          Also converts the existing __this_cpu_xx_# primitive in the x86
          code to raw_cpu_xx_#.
      
      II. Patch 2-4 use the raw_cpu operations in places that would give
           us false positives once they are enabled.
      
      III. Patch 5 adds preemption checks to __this_cpu operations to allow
          checking if preemption is properly disabled when these functions
          are used.
      
      IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
         with this_cpu_ptr. They do not depend on any changes to the percpu
         code. No preemption tests are skipped if they are applied.
      
      V. Patches 21-46 are conversion patches that use this_cpu operations
         in various kernel subsystems/drivers or arch code.
      
      VI.  Patches 47/48 (not included in this series) remove no longer used
          functions (__this_cpu_ptr and __get_cpu_var).  These should only be
          applied after all the conversion patches have made it and after we
          have done additional passes through the kernel to ensure that none of
          the uses of these functions remain.
      
      This patch (of 46):
      
      The patches following this one will add preemption checks to __this_cpu
      ops so we need to have an alternative way to use this_cpu operations
      without preemption checks.
      
      raw_cpu_ops will be the basis for all other ops since these will be the
      operations that do not implement any checks.
      
      Primitive operations are renamed by this patch from __this_cpu_xxx to
      raw_cpu_xxxx.
      
      Also change the uses of the x86 percpu primitives in preempt.h.
      These depend directly on asm/percpu.h (header #include nesting issue).
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bryan Wu <cooloney@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3ca1c10
    • Dave Jones's avatar
      slub: fix leak of 'name' in sysfs_slab_add · 54b6a731
      Dave Jones authored
      The failure paths of sysfs_slab_add don't release the allocation of
      'name' made by create_unique_id() a few lines above the context of the
      diff below.  Create a common exit path to make it more obvious what
      needs freeing.
      
      [vdavydov@parallels.com: free the name only if !unmergeable]
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54b6a731
    • Vladimir Davydov's avatar
      slub: rework sysfs layout for memcg caches · 9a41707b
      Vladimir Davydov authored
      Currently, we try to arrange sysfs entries for memcg caches in the same
      manner as for global caches.  Apart from turning /sys/kernel/slab into a
      mess when there are a lot of kmem-active memcgs created, it actually
      does not work properly - we won't create more than one link to a memcg
      cache in case its parent is merged with another cache.  For instance, if
      A is a root cache merged with another root cache B, we will have the
      following sysfs setup:
      
        X
        A -> X
        B -> X
      
      where X is some unique id (see create_unique_id()).  Now if memcgs M and
      N start to allocate from cache A (or B, which is the same), we will get:
      
        X
        X:M
        X:N
        A -> X
        B -> X
        A:M -> X:M
        A:N -> X:N
      
      Since B is an alias for A, we won't get entries B:M and B:N, which is
      confusing.
      
      It is more logical to have entries for memcg caches under the
      corresponding root cache's sysfs directory.  This would allow us to keep
      sysfs layout clean, and avoid such inconsistencies like one described
      above.
      
      This patch does the trick.  It creates a "cgroup" kset in each root
      cache kobject to keep its children caches there.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a41707b
    • Vladimir Davydov's avatar
      slub: adjust memcg caches when creating cache alias · 84d0ddd6
      Vladimir Davydov authored
      Otherwise, kzalloc() called from a memcg won't clear the whole object.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84d0ddd6
    • Vladimir Davydov's avatar
      memcg, slab: do not destroy children caches if parent has aliases · b8529907
      Vladimir Davydov authored
      Currently we destroy children caches at the very beginning of
      kmem_cache_destroy().  This is wrong, because the root cache will not
      necessarily be destroyed in the end - if it has aliases (refcount > 0),
      kmem_cache_destroy() will simply decrement its refcount and return.  In
      this case, at best we will get a bunch of warnings in dmesg, like this
      one:
      
        kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
        CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
        Call Trace:
          dump_stack+0x49/0x5b
          kmem_cache_destroy+0xdf/0xf0
          kmem_cache_destroy_memcg_children+0x97/0xc0
          kmem_cache_destroy+0xf/0xf0
          xfs_mru_cache_uninit+0x21/0x30 [xfs]
          exit_xfs_fs+0x2e/0xc44 [xfs]
          SyS_delete_module+0x198/0x1f0
          system_call_fastpath+0x16/0x1b
      
      At worst - if kmem_cache_destroy() will race with an allocation from a
      memcg cache - the kernel will panic.
      
      This patch fixes this by moving children caches destruction after the
      check if the cache has aliases.  Plus, it forbids destroying a root
      cache if it still has children caches, because each children cache keeps
      a reference to its parent.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8529907
    • Vladimir Davydov's avatar
      memcg, slab: unregister cache from memcg before starting to destroy it · 051dd460
      Vladimir Davydov authored
      Currently, memcg_unregister_cache(), which deletes the cache being
      destroyed from the memcg_slab_caches list, is called after
      __kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
      destroy the cache.
      
      As a result, one can access a partially destroyed cache while traversing
      a memcg_slab_caches list, which can have deadly consequences (for
      instance, cache_show() called for each cache on a memcg_slab_caches list
      from mem_cgroup_slabinfo_read() will dereference pointers to already
      freed data).
      
      To fix this, let's move memcg_unregister_cache() before the cache
      destruction process beginning, issuing memcg_register_cache() on failure.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      051dd460