1. 04 Jan, 2016 3 commits
    • Andrew Gabbasov's avatar
      udf: Prevent buffer overrun with multi-byte characters · ad402b26
      Andrew Gabbasov authored
      udf_CS0toUTF8 function stops the conversion when the output buffer
      length reaches UDF_NAME_LEN-2, which is correct maximum name length,
      but, when checking, it leaves the space for a single byte only,
      while multi-bytes output characters can take more space, causing
      buffer overflow.
      
      Similar error exists in udf_CS0toNLS function, that restricts
      the output length to UDF_NAME_LEN, while actual maximum allowed
      length is UDF_NAME_LEN-2.
      
      In these cases the output can override not only the current buffer
      length field, causing corruption of the name buffer itself, but also
      following allocation structures, causing kernel crash.
      
      Adjust the output length checks in both functions to prevent buffer
      overruns in case of multi-bytes UTF8 or NLS characters.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      ad402b26
    • Julia Lawall's avatar
      quota: constify qtree_fmt_operations structures · d1b98c23
      Julia Lawall authored
      The qtree_fmt_operations structures are never modified, so declare them as
      const.
      
      Done with the help of Coccinelle.
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      d1b98c23
    • Arnd Bergmann's avatar
      udf: avoid uninitialized variable use · 4f1b1519
      Arnd Bergmann authored
      A new warning has come up from a recent cleanup:
      
      fs/udf/inode.c: In function 'udf_setup_indirect_aext':
      fs/udf/inode.c:1927:28: warning: 'adsize' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      If the alloc_type is neither ICBTAG_FLAG_AD_SHORT nor
      ICBTAG_FLAG_AD_LONG, the value of adsize is undefined. Currently,
      callers of these functions make sure alloc_type is one of the two valid
      ones but for future proofing make sure we handle the case of invalid
      alloc type as well.  This changes the code to return -EIOin that case.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: fcea62ba ("udf: Factor out code for creating indirect extent")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      4f1b1519
  2. 23 Dec, 2015 3 commits
    • Jan Kara's avatar
      udf: Fix lost indirect extent block · 6c371578
      Jan Kara authored
      When inode ends with empty indirect extent block and we extended that
      file, udf_do_extend_file() ended up just overwriting pointer to it with
      another extent and thus effectively leaking the block and also
      corruptiong length of allocation descriptors.
      
      Fix the problem by properly following into next indirect extent when it
      is present.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      6c371578
    • Jan Kara's avatar
      udf: Factor out code for creating indirect extent · fcea62ba
      Jan Kara authored
      Factor out code for creating indirect extent from udf_add_aext(). It was
      mostly duplicated in two places. Also remove some opencoded versions
      of udf_write_aext().
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      fcea62ba
    • Vegard Nossum's avatar
      udf: limit the maximum number of indirect extents in a row · b0918d9f
      Vegard Nossum authored
      udf_next_aext() just follows extent pointers while extents are marked as
      indirect. This can loop forever for corrupted filesystem. Limit number
      the of indirect extents we are willing to follow in a row.
      
      [JK: Updated changelog, limit, style]
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: stable@vger.kernel.org
      Cc: Jan Kara <jack@suse.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b0918d9f
  3. 14 Dec, 2015 4 commits
    • Vegard Nossum's avatar
      udf: limit the maximum number of TD redirections · e7a4eb86
      Vegard Nossum authored
      Filesystem fuzzing revealed that we could get stuck in the
      udf_process_sequence() loop.
      
      The maximum limit was chosen arbitrarily but fixes the problem I saw.
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      e7a4eb86
    • Paul Gortmaker's avatar
      fs: make quota/dquot.c explicitly non-modular · 331221fa
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      config QUOTA
              bool "Quota support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modularity so that when reading the
      driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering gets bumped to one level earlier when we
      use the more appropriate fs_initcall here.  However we've made similar
      changes before without any fallout and none is expected here either.
      
      We don't delete module.h because the code in turn tries to load other
      modules as appropriate and so it still needs that header.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      331221fa
    • Paul Gortmaker's avatar
      fs: make quota/netlink.c explicitly non-modular · 7da54463
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      config QUOTA_NETLINK_INTERFACE
              bool "Report quota messages through netlink interface"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the couple traces of modularity so that when reading the
      driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular
      case, the init ordering gets bumped to one level earlier when we
      use the more appropriate fs_initcall here.  However we've made similar
      changes before without any fallout and none is expected here either.
      
      Cc: Jan Kara <jack@suse.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      7da54463
    • Linus Torvalds's avatar
      Linux 4.4-rc5 · 9f9499ae
      Linus Torvalds authored
      9f9499ae
  4. 13 Dec, 2015 8 commits
    • Peter Zijlstra's avatar
      sched/wait: Fix the signal handling fix · dfd01f02
      Peter Zijlstra authored
      Jan Stancek reported that I wrecked things for him by fixing things for
      Vladimir :/
      
      His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
      should not be possible, however my previous patch made this possible by
      unconditionally checking signal_pending().
      
      We cannot use current->state as was done previously, because the
      instruction after the store to that variable it can be changed.  We must
      instead pass the initial state along and use that.
      
      Fixes: 68985633 ("sched/wait: Fix signal handling in bit wait helpers")
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Reported-by: default avatarChris Mason <clm@fb.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Tested-by: default avatarChris Mason <clm@fb.com>
      Reviewed-by: default avatarPaul Turner <pjt@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: tglx@linutronix.de
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: hpa@zytor.com
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfd01f02
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fc891828
      Linus Torvalds authored
      Pull NFS client bugfix from Trond Myklebust:
       "SUNRPC: Fix a NFSv4.1 callback channel regression"
      
      * tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        SUNRPC: Fix callback channel
      fc891828
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dec9cbf9
      Linus Torvalds authored
      Pull timer fixlets from Thomas Gleixner:
       "Two trivial fixes which add missing header fileas and forward
        declarations so the code will compile even when the magic include
        chains are different"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3: Add missing include for barrier.h
        irqchip/gic-v3: Add missing struct device_node declaration
      dec9cbf9
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 43afc99d
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A single fix to unbreak a clocksource driver which has more than 32bit
        counter width"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Mmio: remove artificial 32bit limitation
      43afc99d
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f17ef495
      Linus Torvalds authored
      Pull fpga driver fixes from Greg KH:
       "Only two small fpga driver fixes here, both have been in linux-next
        for a while, and resolve some reported issues"
      
      * tag 'char-misc-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        fpga manager: Fix firmware resource leak on error
        fpga manager: remove label
      f17ef495
    • Linus Torvalds's avatar
      Merge tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · b24f74e3
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are a few staging and IIO driver fixes for 4.4-rc5.
      
        All of them resolve reported problems and have been in linux-next for
        a while.  Nothing major here, just small fixes where needed"
      
      * tag 'staging-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: lustre: echo_copy.._lsm() dereferences userland pointers directly
        iio: adc: spmi-vadc: add missing of_node_put
        iio: fix some warning messages
        iio: light: apds9960: correct ->last_busy count
        iio: lidar: return -EINVAL on invalid signal
        staging: iio: dummy: complete IIO events delivery to userspace
      b24f74e3
    • Linus Torvalds's avatar
      Merge tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c474009c
      Linus Torvalds authored
      Pull USB driver fixes from Greg KH:
       "Here are a number of small USB fixes for 4.4-rc5.  All of them have
        been in linux-next.  The majority are gadget and phy issues, with a
        few new quirks and device ids added as well"
      
      * tag 'usb-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (32 commits)
        USB: add quirk for devices with broken LPM
        xhci: fix usb2 resume timing and races.
        usb: musb: fail with error when no DMA controller set
        usb: gadget: uvc: fix permissions of configfs attributes
        usb: musb: core: Fix pm runtime for deferred probe
        usb: phy: msm: fix a possible NULL dereference
        USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
        usb: Quiet down false peer failure messages
        usb: xhci: fix config fail of FS hub behind a HS hub with MTT
        xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
        usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message
        USB: whci-hcd: add check for dma mapping error
        usb: core : hub: Fix BOS 'NULL pointer' kernel panic
        USB: quirks: Apply ALWAYS_POLL to all ELAN devices
        usb-storage: Fix scsi-sd failure "Invalid field in cdb" for USB adapter JMicron
        USB: quirks: Fix another ELAN touchscreen
        usb: dwc3: gadget: don't prestart interrupt endpoints
        USB: serial: Another Infineon flash loader USB ID
        USB: cdc_acm: Ignore Infineon Flash Loader utility
        USB: cp210x: Remove CP2110 ID from compatibility list
        ...
      c474009c
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 097b285d
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Here are a bunch of small bug fixes for various ARM platforms, nothing
        really sticks out this week, most of either fixes bugs in code that
        was just added in 4.4, or that has been broken for many years without
        anyone noticing.
      
        at91/sama5d2:
         - fix sama5de hardware setup of sd/mmc interface
         - proper selection of pinctrl drivers.  PIO4 is necessary for sama5d2
      
        berlin:
         - fix incorrect clock input for SDIO
      
        exynos:
         - Fix potential NULL pointer dereference in Exynos PMU driver.
      
        imx:
         - Fix vf610 SAI clock configuration bug which is discovered by the
           newly added master mode support in SAI audio driver.
         - Fix buggy L2 cache latency values in vf610 device trees, which may
           cause system hang when cpu runs at a higher frequency.
      
        ixp4xx:
         - fix prototypes for readl/writel functions
      
        ls2080a:
         - use little-endian register access for GPIO and SDHCI
      
        omap:
         - Fix clock source for ARM TWD and global timers on am437x
         - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when
           MACH_OMAP3_PANDORA is selected
         - Fix SPI DMA handles for dm816x as only some were mapped
         - Fix up mbox cells for dm816x to make mailbox usable
      
        pxa:
         - use PWM lookup table for all ezx machines
      
        s3c24xx:
         - Remove incorrect __init annotation from s3c24xx cpufreq driver
           structures.
      
        versatile:
         - fix PCI IRQ mapping on Versatile PB"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ls2080a/dts: Add little endian property for GPIO IP block
        dt-bindings: define little-endian property for QorIQ GPIO
        ARM64: dts: ls2080a: fix eSDHC endianness
        ARM: dts: vf610: use reset values for L2 cache latencies
        ARM: pxa: use PWM lookup table for all machines
        ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1
        ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
        ARM: dts: am4372: fix clock source for arm twd and global timers
        ARM: at91: fix pinctrl driver selection
        ARM: at91/dt: add always-on to 1.8V regulator
        ARM: dts: vf610: fix clock definition for SAI2
        ARM: imx: clk-vf610: fix SAI clock tree
        ARM: ixp4xx: fix read{b,w,l} return types
        irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB
        ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE
        ARM: dts: add dm816x missing spi DT dma handles
        ARM: dts: add dm816x missing #mbox-cells
        cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init
        ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
      097b285d
  5. 12 Dec, 2015 22 commits
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 79dbddaf
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - opal-irqchip: Fix double endian conversion from Alistair Popple
       - cxl: Set endianess of kernel contexts from Frederic Barrat
       - sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker
       - Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew
         Donnellan
      
      * tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
        powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
        cxl: Set endianess of kernel contexts
        powerpc/opal-irqchip: Fix double endian conversion
      79dbddaf
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 800f1ac4
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "17 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        MIPS: fix DMA contiguous allocation
        sh64: fix __NR_fgetxattr
        ocfs2: fix SGID not inherited issue
        mm/oom_kill.c: avoid attempting to kill init sharing same memory
        drivers/base/memory.c: prohibit offlining of memory blocks with missing sections
        tmpfs: fix shmem_evict_inode() warnings on i_blocks
        mm/hugetlb.c: fix resv map memory leak for placeholder entries
        mm: hugetlb: call huge_pte_alloc() only if ptep is null
        kernel: remove stop_machine() Kconfig dependency
        mm: kmemleak: mark kmemleak_init prototype as __init
        mm: fix kerneldoc on mem_cgroup_replace_page
        osd fs: __r4w_get_page rely on PageUptodate for uptodate
        MAINTAINERS: make Vladimir co-maintainer of the memory controller
        mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
        mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
        memcg: fix memory.high target
        mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
      800f1ac4
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · a971526e
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "Fix the boot crash on Mako machines with Huge Pages, prevent a panic
        with SATA controllers (and others) by correctly calculating the IOMMU
        space, hook up the mlock2 syscall and drop unneeded code in the parisc
        pci code"
      
      * 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Disable huge pages on Mako machines
        parisc: Wire up mlock2 syscall
        parisc: Remove unused pcibios_init_bus()
        parisc iommu: fix panic due to trying to allocate too large region
      a971526e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 78075631
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "A set of fixes for the current series.  This contains:
      
         - A bunch of fixes for lightnvm, should be the last round for this
           series.  From Matias and Wenwei.
      
         - A writeback detach inode fix from Ilya, also marked for stable.
      
         - A block (though it says SCSI) fix for an OOPS in SCSI runtime power
           management.
      
         - Module init error path fixes for null_blk from Minfei"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        null_blk: Fix error path in module initialization
        lightnvm: do not compile in debugging by default
        lightnvm: prevent gennvm module unload on use
        lightnvm: fix media mgr registration
        lightnvm: replace req queue with nvmdev for lld
        lightnvm: comments on constants
        lightnvm: check mm before use
        lightnvm: refactor spin_unlock in gennvm_get_blk
        lightnvm: put blks when luns configure failed
        lightnvm: use flags in rrpc_get_blk
        block: detach bdev inode from its wb in __blkdev_put()
        SCSI: Fix NULL pointer dereference in runtime PM
      78075631
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 6539756e
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Update the linker script to use L1_CACHE_BYTES instead of hard-coded
         64.  We recently changed L1_CACHE_BYTES to 128
      
       - Improve race condition reporting on set_pte_at() and change the BUG
         to WARN_ONCE.  With hardware update of the accessed/dirty state, we
         need to ensure that set_pte_at() does not inadvertently override
         hardware updated state.  The patch also makes the checks ignore
         !pte_valid() new entries
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Improve error reporting on set_pte_at() checks
        arm64: update linker script to increased L1_CACHE_BYTES value
      6539756e
    • Qais Yousef's avatar
      MIPS: fix DMA contiguous allocation · 9530d0fe
      Qais Yousef authored
      Recent changes to how GFP_ATOMIC is defined seems to have broken the
      condition to use mips_alloc_from_contiguous() in
      mips_dma_alloc_coherent().
      
      I couldn't bottom out the exact change but I think it's this commit
      d0164adc ("mm, page_alloc: distinguish between being unable to
      sleep, unwilling to sleep and avoiding waking kswapd").
      
      GFP_ATOMIC has multiple bits set and the check for !(gfp & GFP_ATOMIC)
      isn't enough.
      
      The reason behind this condition is to check whether we can potentially
      do a sleeping memory allocation.  Use gfpflags_allow_blocking() instead
      which should be more robust.
      Signed-off-by: default avatarQais Yousef <qais.yousef@imgtec.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9530d0fe
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · 2d33fa10
      Dmitry V. Levin authored
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d33fa10
    • Junxiao Bi's avatar
      ocfs2: fix SGID not inherited issue · 854ee2e9
      Junxiao Bi authored
      Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an
      issue, SGID of sub dir was not inherited from its parents dir.  It is
      because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
      is overwritten by "mode" which don't have SGID set later.
      
      Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      854ee2e9
    • Chen Jie's avatar
      mm/oom_kill.c: avoid attempting to kill init sharing same memory · a2b829d9
      Chen Jie authored
      It's possible that an oom killed victim shares an ->mm with the init
      process and thus oom_kill_process() would end up trying to kill init as
      well.
      
      This has been shown in practice:
      
      	Out of memory: Kill process 9134 (init) score 3 or sacrifice child
      	Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB
      	Kill process 1 (init) sharing same memory
      	...
      	Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
      
      And this will result in a kernel panic.
      
      If a process is forked by init and selected for oom kill while still
      sharing init_mm, then it's likely this system is in a recoverable state.
      However, it's better not to try to kill init and allow the machine to
      panic due to unkillable processes.
      
      [rientjes@google.com: rewrote changelog]
      [akpm@linux-foundation.org: fix inverted test, per Ben]
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2b829d9
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 26bbe7ef
      Seth Jennings authored
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26bbe7ef
    • Hugh Dickins's avatar
      tmpfs: fix shmem_evict_inode() warnings on i_blocks · 267a4c76
      Hugh Dickins authored
      Dmitry Vyukov provides a little program, autogenerated by syzkaller,
      which races a fault on a mapping of a sparse memfd object, against
      truncation of that object below the fault address: run repeatedly for a
      few minutes, it reliably generates shmem_evict_inode()'s
      WARN_ON(inode->i_blocks).
      
      (But there's nothing specific to memfd here, nor to the fstat which it
      happened to use to generate the fault: though that looked suspicious,
      since a shmem_recalc_inode() had been added there recently.  The same
      problem can be reproduced with open+unlink in place of memfd_create, and
      with fstatfs in place of fstat.)
      
      v3.7 commit 0f3c42f5 ("tmpfs: change final i_blocks BUG to WARNING")
      explains one cause of such a warning (a race with shmem_writepage to
      swap), and possible solutions; but we never took it further, and this
      syzkaller incident turns out to have a different cause.
      
      shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
      then found to be beyond eof, looks plausible - decrementing the alloced
      count that was just before incremented - but in fact can go wrong, if a
      racing thread (the truncator, for example) gets its shmem_recalc_inode()
      in just after our delete_from_page_cache().  delete_from_page_cache()
      decrements nrpages, that shmem_recalc_inode() will balance the books by
      decrementing alloced itself, then our decrement of alloced take it one
      too low: leading to the WARNING when the object is finally evicted.
      
      Once the new page has been exposed in the page cache,
      shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
      the accounting right in all cases (and not fall through from "trunc:" to
      "decused:").  Adjust that error recovery block; and the reinitialization
      of info and sbinfo can be removed too.
      
      While we're here, fix shmem_writepage() to avoid the original issue: it
      will be safe against a racing shmem_recalc_inode(), if it merely
      increments swapped before the shmem_delete_from_page_cache() which
      decrements nrpages (but it must then do its own shmem_recalc_inode()
      before that, while still in balance, instead of after).  (Aside: why do
      we shmem_recalc_inode() here in the swap path? Because its raison d'etre
      is to cope with clean sparse shmem pages being reclaimed behind our
      back: so here when swapping is a good place to look for that case.) But
      I've not now managed to reproduce this bug, even without the patch.
      
      I don't see why I didn't do that earlier: perhaps inhibited by the
      preference to eliminate shmem_recalc_inode() altogether.  Driven by this
      incident, I do now have a patch to do so at last; but still want to sit
      on it for a bit, there's a couple of questions yet to be resolved.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      267a4c76
    • Mike Kravetz's avatar
      mm/hugetlb.c: fix resv map memory leak for placeholder entries · dbe409e4
      Mike Kravetz authored
      Dmitry Vyukov reported the following memory leak
      
      unreferenced object 0xffff88002eaafd88 (size 32):
        comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s)
        hex dump (first 32 bytes):
          28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff  (.Nc....(.Nc....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
           kmalloc include/linux/slab.h:458
           region_chg+0x2d4/0x6b0 mm/hugetlb.c:398
           __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791
           vma_needs_reservation mm/hugetlb.c:1813
           alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845
           hugetlb_no_page mm/hugetlb.c:3543
           hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717
           follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880
           __get_user_pages+0x542/0xf30 mm/gup.c:497
           populate_vma_page_range+0xde/0x110 mm/gup.c:919
           __mm_populate+0x1c7/0x310 mm/gup.c:969
           do_mlock+0x291/0x360 mm/mlock.c:637
           SYSC_mlock2 mm/mlock.c:658
           SyS_mlock2+0x4b/0x70 mm/mlock.c:648
      
      Dmitry identified a potential memory leak in the routine region_chg,
      where a region descriptor is not free'ed on an error path.
      
      However, the root cause for the above memory leak resides in region_del.
      In this specific case, a "placeholder" entry is created in region_chg.
      The associated page allocation fails, and the placeholder entry is left
      in the reserve map.  This is "by design" as the entry should be deleted
      when the map is released.  The bug is in the region_del routine which is
      used to delete entries within a specific range (and when the map is
      released).  region_del did not handle the case where a placeholder entry
      exactly matched the start of the range range to be deleted.  In this
      case, the entry would not be deleted and leaked.  The fix is to take
      these special placeholder entries into account in region_del.
      
      The region_chg error path leak is also fixed.
      
      Fixes: feba16e2 ("mm/hugetlb: add region_del() to delete a specific range of entries")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: <stable@vger.kernel.org>	[4.3+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbe409e4
    • Naoya Horiguchi's avatar
      mm: hugetlb: call huge_pte_alloc() only if ptep is null · 0d777df5
      Naoya Horiguchi authored
      Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
      and check whether the obtained *ptep is a migration/hwpoison entry or
      not.  And if not, then we get to call huge_pte_alloc().  This is racy
      because the *ptep could turn into migration/hwpoison entry after the
      huge_pte_offset() check.  This race results in BUG_ON in
      huge_pte_alloc().
      
      We don't have to call huge_pte_alloc() when the huge_pte_offset()
      returns non-NULL, so let's fix this bug with moving the code into else
      block.
      
      Note that the *ptep could turn into a migration/hwpoison entry after
      this block, but that's not a problem because we have another
      !pte_present check later (we never go into hugetlb_no_page() in that
      case.)
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>	[2.6.36+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d777df5
    • Chris Wilson's avatar
      kernel: remove stop_machine() Kconfig dependency · 86fffe4a
      Chris Wilson authored
      Currently the full stop_machine() routine is only enabled on SMP if
      module unloading is enabled, or if the CPUs are hotpluggable.  This
      leads to configurations where stop_machine() is broken as it will then
      only run the callback on the local CPU with irqs disabled, and not stop
      the other CPUs or run the callback on them.
      
      For example, this breaks MTRR setup on x86 in certain configs since
      ea8596bb ("kprobes/x86: Remove unused text_poke_smp() and
      text_poke_smp_batch() functions") as the MTRR is only established on the
      boot CPU.
      
      This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
      and HOTPLUG_CPU config options to compile the correct stop_machine() for
      the architecture, removing the false dependency on MODULE_UNLOAD in the
      process.
      
      Link: https://lkml.org/lkml/2014/10/8/124
      References: https://bugs.freedesktop.org/show_bug.cgi?id=84794Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86fffe4a
    • Nicolas Iooss's avatar
      mm: kmemleak: mark kmemleak_init prototype as __init · 98e89cf0
      Nicolas Iooss authored
      The kmemleak_init() definition in mm/kmemleak.c is marked __init but its
      prototype in include/linux/kmemleak.h is marked __ref since commit
      a6186d89 ("kmemleak: Mark the early log buffer as __initdata").
      
      This causes a section mismatch which is reported as a warning when
      building with clang -Wsection, because kmemleak_init() is declared in
      section .ref.text but defined in .init.text.
      
      Fix this by marking kmemleak_init() prototype __init.
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98e89cf0
    • Hugh Dickins's avatar
      mm: fix kerneldoc on mem_cgroup_replace_page · 25be6a65
      Hugh Dickins authored
      Whoops, I missed removing the kerneldoc comment of the lrucare arg
      removed from mem_cgroup_replace_page; but it's a good comment, keep it.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25be6a65
    • Hugh Dickins's avatar
      osd fs: __r4w_get_page rely on PageUptodate for uptodate · 3066a967
      Hugh Dickins authored
      Commit 42cb14b1 ("mm: migrate dirty page without
      clear_page_dirty_for_io etc") simplified the migration of a PageDirty
      pagecache page: one stat needs moving from zone to zone and that's about
      all.
      
      It's convenient and safest for it to shift the PageDirty bit from old
      page to new, just before updating the zone stats: before copying data
      and marking the new PageUptodate.  This is all done while both pages are
      isolated and locked, just as before; and just as before, there's a
      moment when the new page is visible in the radix_tree, but not yet
      PageUptodate.  What's new is that it may now be briefly visible as
      PageDirty before it is PageUptodate.
      
      When I scoured the tree to see if this could cause a problem anywhere,
      the only places I found were in two similar functions __r4w_get_page():
      which look up a page with find_get_page() (not using page lock), then
      claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
      
      I'm not sure whether that was right before, but now it might be wrong
      (on rare occasions): only claim the page is uptodate if PageUptodate.
      Or perhaps the page in question could never be migratable anyway?
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarBoaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3066a967
    • Johannes Weiner's avatar
      MAINTAINERS: make Vladimir co-maintainer of the memory controller · ed0f1e21
      Johannes Weiner authored
      Vladimir architected and authored much of the current state of the
      memcg's slab memory accounting and tracking.  Make sure he gets CC'd on
      bug reports ;-)
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed0f1e21
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 373ccbe5
      Michal Hocko authored
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      373ccbe5
    • Vlastimil Babka's avatar
      mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo · 475a2f90
      Vlastimil Babka authored
      Commit 016c13da ("mm, page_alloc: use masks and shifts when
      converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
      MIGRATE_RECLAIMABLE in the enum definition.  However, migratetype_names
      wasn't updated to reflect that.
      
      As a result, the file /proc/pagetypeinfo shows the counts for Movable as
      Reclaimable and vice versa.
      
      Additionally, commit 0aaa29a5 ("mm, page_alloc: reserve pageblocks
      for high-order atomic allocations on demand") introduced
      MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
      show_migration_types(), so it doesn't appear in the listing of free
      areas during page alloc failures or oom kills.
      
      This patch fixes both problems.  The atomic reserves will show with a
      letter 'H' in the free areas listings.
      
      Fixes: 016c13da ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types")
      Fixes: 0aaa29a5 ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      475a2f90
    • Vladimir Davydov's avatar
      memcg: fix memory.high target · 9516a18a
      Vladimir Davydov authored
      When the memory.high threshold is exceeded, try_charge() schedules a
      task_work to reclaim the excess.  The reclaim target is set to the
      number of pages requested by try_charge().
      
      This is wrong, because try_charge() usually charges more pages than
      requested (batch > nr_pages) in order to refill per cpu stocks.  As a
      result, a process in a cgroup can easily exceed memory.high
      significantly when doing a lot of charges w/o returning to userspace
      (e.g.  reading a file in big chunks).
      
      Fix this issue by assuring that when exceeding memory.high a process
      reclaims as many pages as were actually charged (i.e.  batch).
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9516a18a
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · a88c7695
      Naoya Horiguchi authored
      When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
      alloc_buddy_huge_page() to directly create a hugepage from the buddy
      allocator.
      
      In that case, however, if alloc_buddy_huge_page() succeeds we don't
      decrement h->resv_huge_pages, which means that successful
      hugetlb_fault() returns without releasing the reserve count.  As a
      result, subsequent hugetlb_fault() might fail despite that there are
      still free hugepages.
      
      This patch simply adds decrementing code on that code path.
      
      I reproduced this problem when testing v4.3 kernel in the following situation:
       - the test machine/VM is a NUMA system,
       - hugepage overcommiting is enabled,
       - most of hugepages are allocated and there's only one free hugepage
         which is on node 0 (for example),
       - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
         node 1, tries to allocate a hugepage,
       - the allocation should fail but the reserve count is still hold.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org> [3.16+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a88c7695