1. 10 Jun, 2015 7 commits
    • Mel Gorman's avatar
      sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappings · 8e76d4ee
      Mel Gorman authored
      Jovi Zhangwei reported the following problem
      
        Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
        with GFP_COMP flag.
      
        [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
        [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
        [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
        [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
        [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
        [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP
      
      In this case it was triggered by running tcpdump but it's not necessary
      reproducible on all systems.
      
        sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap
      
      Compound pages cannot be migrated and it was not expected that such pages
      be marked for NUMA balancing.  This did not take into account that drivers
      such as net/packet/af_packet.c may insert compound pages into userspace
      with vm_insert_page.  This patch tells the NUMA balancing protection
      scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
      compound pages are marked for migration.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e76d4ee
    • Sergey Senozhatsky's avatar
      zsmalloc: fix a null pointer dereference in destroy_handle_cache() · 02f7b414
      Sergey Senozhatsky authored
      If zs_create_pool()->create_handle_cache()->kmem_cache_create() or
      pool->name allocation fails, zs_create_pool()->destroy_handle_cache()
      will dereference the NULL pool->handle_cachep.
      
      Modify destroy_handle_cache() to avoid this.
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02f7b414
    • Johannes Weiner's avatar
      mm: memcontrol: fix false-positive VM_BUG_ON() on -rt · f371763a
      Johannes Weiner authored
      On -rt, the VM_BUG_ON(!irqs_disabled()) triggers inside the memcg
      swapout path because the spin_lock_irq(&mapping->tree_lock) in the
      caller doesn't actually disable the hardware interrupts - which is fine,
      because on -rt the tophalves run in process context and so we are still
      safe from preemption while updating the statistics.
      
      Remove the VM_BUG_ON() but keep the comment of what we rely on.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarClark Williams <williams@redhat.com>
      Cc: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f371763a
    • Joe Perches's avatar
      checkpatch: fix "GLOBAL_INITIALISERS" test · 5129e87c
      Joe Perches authored
      Commit d5e616fc ("checkpatch: add a few more --fix corrections")
      broke the GLOBAL_INITIALISERS test with bad parentheses and optional
      leading spaces.
      
      Fix it.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarBandan Das <bsd@makefile.in>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5129e87c
    • Weijie Yang's avatar
      zram: clear disk io accounting when reset zram device · d7ad41a1
      Weijie Yang authored
      Clear zram disk io accounting when resetting the zram device.  Otherwise
      the residual io accounting stat will affect the diskstat in the next
      zram active cycle.
      Signed-off-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7ad41a1
    • Vladimir Davydov's avatar
      memcg: do not call reclaim if !__GFP_WAIT · 7d638093
      Vladimir Davydov authored
      When trimming memcg consumption excess (see memory.high), we call
      try_to_free_mem_cgroup_pages without checking if we are allowed to sleep
      in the current context, which can result in a deadlock.  Fix this.
      
      Fixes: 241994ed ("mm: memcontrol: default hierarchy interface for memory")
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d638093
    • Gu Zheng's avatar
      mm/memory_hotplug.c: set zone->wait_table to null after freeing it · 85bd8399
      Gu Zheng authored
      Izumi found the following oops when hot re-adding a node:
      
          BUG: unable to handle kernel paging request at ffffc90008963690
          IP: __wake_up_bit+0x20/0x70
          Oops: 0000 [#1] SMP
          CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
          Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
          task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
          RIP: 0010:[<ffffffff810dff80>]  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
          RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
          RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
          RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
          RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
          R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
          FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
          Call Trace:
            unlock_page+0x6d/0x70
            generic_write_end+0x53/0xb0
            xfs_vm_write_end+0x29/0x80 [xfs]
            generic_perform_write+0x10a/0x1e0
            xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
            xfs_file_write_iter+0x79/0x120 [xfs]
            __vfs_write+0xd4/0x110
            vfs_write+0xac/0x1c0
            SyS_write+0x58/0xd0
            system_call_fastpath+0x12/0x76
          Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
          RIP  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
           RSP <ffff880017b97be8>
          CR2: ffffc90008963690
      
      Reproduce method (re-add a node)::
        Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)
      
      This seems an use-after-free problem, and the root cause is
      zone->wait_table was not set to *NULL* after free it in
      try_offline_node.
      
      When hot re-add a node, we will reuse the pgdat of it, so does the zone
      struct, and when add pages to the target zone, it will init the zone
      first (including the wait_table) if the zone is not initialized.  The
      judgement of zone initialized is based on zone->wait_table:
      
      	static inline bool zone_is_initialized(struct zone *zone)
      	{
      		return !!zone->wait_table;
      	}
      
      so if we do not set the zone->wait_table to *NULL* after free it, the
      memory hotplug routine will skip the init of new zone when hot re-add
      the node, and the wait_table still points to the freed memory, then we
      will access the invalid address when trying to wake up the waiting
      people after the i/o operation with the page is done, such as mentioned
      above.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85bd8399
  2. 09 Jun, 2015 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5879ae5f
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix stack allocation in s390 BPF JIT, from Michael Holzheu.
      
       2) Disable LRO on openvswitch paths, from Jiri Benc.
      
       3) UDP early demux doesn't handle multicast group membership properly,
          fix from Shawn Bohrer.
      
       4) Fix TX queue hang due to incorrect handling of mixed sized fragments
          and linearlization in i40e driver, from Anjali Singhai Jain.
      
       5) Cannot use disable_irq() in timer handler of AMD xgbe driver, from
          Thomas Lendacky.
      
       6) b2net driver improperly assumes pci_alloc_consistent() gives zero'd
          out memory, use dma_zalloc_coherent().  From Sriharsha Basavapatna.
      
       7) Fix use-after-free in MPLS and ipv6, from Robert Shearman.
      
       8) Missing neif_napi_del() calls in cleanup paths of b44 driver, from
          Hauke Mehrtens.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: replace last open coded skb_orphan_frags with function call
        net: bcmgenet: power on MII block for all MII modes
        ipv6: Fix protocol resubmission
        ipv6: fix possible use after free of dev stats
        b44: call netif_napi_del()
        bridge: disable softirqs around br_fdb_update to avoid lockup
        Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup"
        mpls: fix possible use after free of device
        be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent()
        bridge: use _bh spinlock variant for br_fdb_update to avoid lockup
        amd-xgbe: Use disable_irq_nosync from within timer function
        rhashtable: add missing import <linux/export.h>
        i40e: Make sure to be in VEB mode if SRIOV is enabled at probe
        i40e: start up in VEPA mode by default
        i40e/i40evf: Fix mixed size frags and linearization
        ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
        openvswitch: disable LRO
        s390/bpf: fix bpf frame pointer setup
        s390/bpf: fix stack allocation
      5879ae5f
  3. 08 Jun, 2015 15 commits
  4. 07 Jun, 2015 10 commits
  5. 06 Jun, 2015 7 commits