1. 12 Dec, 2015 2 commits
    • Vladimir Davydov's avatar
      memcg: fix memory.high target · 9516a18a
      Vladimir Davydov authored
      When the memory.high threshold is exceeded, try_charge() schedules a
      task_work to reclaim the excess.  The reclaim target is set to the
      number of pages requested by try_charge().
      
      This is wrong, because try_charge() usually charges more pages than
      requested (batch > nr_pages) in order to refill per cpu stocks.  As a
      result, a process in a cgroup can easily exceed memory.high
      significantly when doing a lot of charges w/o returning to userspace
      (e.g.  reading a file in big chunks).
      
      Fix this issue by assuring that when exceeding memory.high a process
      reclaims as many pages as were actually charged (i.e.  batch).
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9516a18a
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · a88c7695
      Naoya Horiguchi authored
      When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
      alloc_buddy_huge_page() to directly create a hugepage from the buddy
      allocator.
      
      In that case, however, if alloc_buddy_huge_page() succeeds we don't
      decrement h->resv_huge_pages, which means that successful
      hugetlb_fault() returns without releasing the reserve count.  As a
      result, subsequent hugetlb_fault() might fail despite that there are
      still free hugepages.
      
      This patch simply adds decrementing code on that code path.
      
      I reproduced this problem when testing v4.3 kernel in the following situation:
       - the test machine/VM is a NUMA system,
       - hugepage overcommiting is enabled,
       - most of hugepages are allocated and there's only one free hugepage
         which is on node 0 (for example),
       - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
         node 1, tries to allocate a hugepage,
       - the allocation should fail but the reserve count is still hold.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org> [3.16+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a88c7695
  2. 11 Dec, 2015 5 commits
  3. 10 Dec, 2015 6 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 0bd0f1e6
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "Most are minor to important fixes.
      
        There is one performance enhancement that I took on the grounds that
        failing to check if other processes can run before running what's
        intended to be a background, idle-time task is a bug, even though the
        primary effect of the fix is to improve performance (and it was a very
        simple patch)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/mlx5: Postpone remove_keys under knowledge of coming preemption
        IB/mlx4: Use vmalloc for WR buffers when needed
        IB/mlx4: Use correct order of variables in log message
        iser-target: Remove explicit mlx4 work-around
        mlx4: Expose correct max_sge_rd limit
        IB/mad: Require CM send method for everything except ClassPortInfo
        IB/cma: Add a missing rcu_read_unlock()
        IB core: Fix ib_sg_to_pages()
        IB/srp: Fix srp_map_sg_fr()
        IB/srp: Fix indirect data buffer rkey endianness
        IB/srp: Initialize dma_length in srp_map_idb
        IB/srp: Fix possible send queue overflow
        IB/srp: Fix a memory leak
        IB/sa: Put netlink request into the request list before sending
        IB/iser: use sector_div instead of do_div
        IB/core: use RCU for uverbs id lookup
        IB/qib: Minor fixes to qib per SFF 8636
        IB/core: Fix user mode post wr corruption
        IB/qib: Fix qib_mr structure
      0bd0f1e6
    • Linus Torvalds's avatar
      Merge tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a80c47da
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Again less intensive changes in this rc: you can find only a few
        HD-audio fixes (noise fixes for Intel Broxton chip and a few Thinkpad
        models, quirks for Alienware 17 and Packard Bell DOTS) in addition to
        a long-standing rme96 bug fix"
      
      * tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/ca0132 - quirk for Alienware 17 2015
        ALSA: hda - Fix noise problems on Thinkpad T440s
        ALSA: hda - Fixing speaker noise on the two latest thinkpad models
        ALSA: hda - Add inverted dmic for Packard Bell DOTS
        ALSA: hda - Fix playback noise with 24/32 bit sample size on BXT
        ALSA: rme96: Fix unexpected volume reset after rate changes
      a80c47da
    • Joe Thornber's avatar
      dm btree: fix bufio buffer leaks in dm_btree_del() error path · ed8b45a3
      Joe Thornber authored
      If dm_btree_del()'s call to push_frame() fails, e.g. due to
      btree_node_validator finding invalid metadata, the dm_btree_del() error
      path must unlock all frames (which have active dm-bufio buffers) that
      were pushed onto the del_stack.
      
      Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
      buffers have leaked, e.g.:
        device-mapper: bufio: leaked buffer 3, hold count 1, list 0
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      ed8b45a3
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio · 6764e5eb
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Various fixes for removing redundancy, const'ifying structs, avoiding
         stack usage, fixing WARN usage (Krzysztof Kozlowski, Julia Lawall,
         Kees Cook, Dan Carpenter)
      
       - Revert No-IOMMU mode as the intended user has not emerged (Alex
         Williamson)
      
      * tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio:
        Revert: "vfio: Include No-IOMMU mode"
        vfio: fix a warning message
        vfio: platform: remove needless stack usage
        vfio-pci: constify pci_error_handlers structures
        vfio: Drop owner assignment from platform_driver
      6764e5eb
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-4.4-rc4' of... · eef121f4
      Linus Torvalds authored
      Merge tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
      
      Pull DT fixes from Rob Herring:
       "I think this should be all for 4.4:
      
         - Fix incorrect warning about overlapping memory regions
      
         - Export of_irq_find_parent again which was made static in 4.4, but
           has users pending for 4.5.
      
         - Fix of_msi_map_rid declaration location
      
         - Fix re-entrancy for of_fdt_unflatten_tree
      
         - Clean-up of phys_addr_t printks"
      
      * tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        of/irq: move of_msi_map_rid declaration to the correct ifdef section
        of/irq: Export of_irq_find_parent again
        of/fdt: Add mutex protection for calls to __unflatten_device_tree()
        of/address: fix typo in comment block of of_translate_one()
        of: do not use 0x in front of %pa
        of: Fix comparison of reserved memory regions
      eef121f4
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · abb7e2b3
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One small build fix, a couple do_div() fixes, and a fix for the gpio
        basic clock type are the major changes here.  There's also a couple
        fixes for the TI, sunxi, and scpi clock drivers"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sunxi: pll2: Fix clock running too fast
        clk: scpi: add missing of_node_put
        clk: qoriq: fix memory leak
        imx/clk-pllv2: fix wrong do_div() usage
        imx/clk-pllv1: fix wrong do_div() usage
        clk: mmp: add linux/clk.h includes
        clk: ti: drop locking code from mux/divider drivers
        clk: ti816x: Add missing dmtimer clkdev entries
        clk: ti: fapll: fix wrong do_div() usage
        clk: ti: clkt_dpll: fix wrong do_div() usage
        clk: gpio: Get parent clk names in of_gpio_clk_setup()
      abb7e2b3
  4. 09 Dec, 2015 19 commits
  5. 08 Dec, 2015 8 commits
    • Leon Romanovsky's avatar
      IB/mlx5: Postpone remove_keys under knowledge of coming preemption · ab5cdc31
      Leon Romanovsky authored
      The remove_keys() logic is performed as garbage collection task. Such
      task is intended to be run when no other active processes are running.
      
      The need_resched() will return TRUE if there are user tasks to be
      activated in near future.
      
      In such case, we don't execute remove_keys() and postpone
      the garbage collection work to try to run in next cycle,
      in order to free CPU resources to other tasks.
      
      The possible pseudo-code to trigger such scenario:
      1. Allocate a lot of MR to fill the cache above the limit.
      2. Wait a small amount of time "to calm" the system.
      3. Start CPU extensive operations on multi-node cluster.
      4. Expect performance degradation during MR cache shrink operation.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      ab5cdc31
    • Wengang Wang's avatar
      IB/mlx4: Use vmalloc for WR buffers when needed · 0ef2f05c
      Wengang Wang authored
      There are several hits that WR buffer allocation(kmalloc) failed.
      It failed at order 3 and/or 4 contigous pages allocation. At the same time
      there are actually 100MB+ free memory but well fragmented.
      So try vmalloc when kmalloc failed.
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      0ef2f05c
    • Wengang Wang's avatar
      IB/mlx4: Use correct order of variables in log message · 73d4da7b
      Wengang Wang authored
      There is a mis-order in mlx4 log. Fix it.
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      73d4da7b
    • Linus Torvalds's avatar
      Merge branch 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 5406812e
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "More change than I'd have liked at this stage.  The pids controller
        and the changes made to cgroup core to support it introduced and
        revealed several important issues.
      
         - Assigning membership to a newly created task and migrating it can
           race leading to incorrect accounting.  Oleg fixed it by widening
           threadgroup synchronization.  It looks like we'll be able to merge
           it with a different percpu rwsem which is used in fork path making
           things simpler and cheaper.
      
         - The recent change to extend cgroup membership to zombies (so that
           pid accounting can extend till the pid is actually released) missed
           pinning the underlying data structures leading to use-after-free.
           Fixed.
      
         - v2 hierarchy was calling subsystem callbacks with the wrong target
           cgroup_subsys_state based on the incorrect assumption that they
           share the same target.  pids is the first controller affected by
           this.  Subsys callbacks updated so that they can deal with
           multi-target migrations"
      
      * 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup_pids: don't account for the root cgroup
        cgroup: fix handling of multi-destination migration from subtree_control enabling
        cgroup_freezer: simplify propagation of CGROUP_FROZEN clearing in freezer_attach()
        cgroup: pids: kill pids_fork(), simplify pids_can_fork() and pids_cancel_fork()
        cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
        cgroup: make css_set pin its css's to avoid use-afer-free
        cgroup: fix cftype->file_offset handling
      5406812e
    • Linus Torvalds's avatar
      Merge branch 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 633bb738
      Linus Torvalds authored
      Pull libata fixes from Tejun Heo:
       "Nothing too interesting.  All are device specific additions and
        workarounds"
      
      * 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ata/sata_fsl.c: add ATA_FLAG_NO_LOG_PAGE to blacklist the controller for log page reads
        libata-eh.c: Introduce new ata port flag for controller which lockup on read log page
        sata_sil: disable trim
        AHCI: Fix softreset failed issue of Port Multiplier
        sata/mvebu: use #ifdef around suspend/resume code
        ahci: Order SATA device IDs for codename Lewisburg
        ahci: Add Device ID for Intel Sunrise Point PCH
      633bb738
    • Geyslan G. Bem's avatar
      um: fix returns without va_end · 887a9853
      Geyslan G. Bem authored
      When using va_list ensure that va_start will be followed by va_end.
      Signed-off-by: default avatarGeyslan G. Bem <geyslan@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      887a9853
    • Richard Weinberger's avatar
      um: Fix fpstate handling · 8090bfd2
      Richard Weinberger authored
      The x86 FPU cleanup changed fpstate to a plain integer.
      UML on x86 has to deal with that too.
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      8090bfd2
    • Lorenzo Colitti's avatar
      arch: um: fix error when linking vmlinux. · fb1770aa
      Lorenzo Colitti authored
      On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with:
      
      arch/um/os-Linux/built-in.o: In function `os_timer_create':
      /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create'
      arch/um/os-Linux/built-in.o: In function `os_timer_set_interval':
      /android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime'
      arch/um/os-Linux/built-in.o: In function `os_timer_remain':
      /android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime'
      arch/um/os-Linux/built-in.o: In function `os_timer_one_shot':
      /android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime'
      arch/um/os-Linux/built-in.o: In function `os_timer_disable':
      /android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime'
      
      This is because -lrt appears in the generated link commandline
      after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from
      arch/um/Makefile and adding it to the UM-specific section of
      scripts/link-vmlinux.sh.
      Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      fb1770aa