1. 28 Jun, 2015 32 commits
  2. 22 Jun, 2015 1 commit
    • Mel Gorman's avatar
      sched, numa: Do not hint for NUMA balancing on VM_MIXEDMAP mappings · 90b934b1
      Mel Gorman authored
      commit 8e76d4ee upstream.
      
      Jovi Zhangwei reported the following problem
      
        Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
        with GFP_COMP flag.
      
        [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
        [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
        [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
        [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
        [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
        [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP
      
      In this case it was triggered by running tcpdump but it's not necessary
      reproducible on all systems.
      
        sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap
      
      Compound pages cannot be migrated and it was not expected that such pages
      be marked for NUMA balancing.  This did not take into account that drivers
      such as net/packet/af_packet.c may insert compound pages into userspace
      with vm_insert_page.  This patch tells the NUMA balancing protection
      scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
      compound pages are marked for migration.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [jovi: Backported to 3.18: adjust context]
      Signed-off-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      90b934b1
  3. 15 Jun, 2015 7 commits
    • Ilya Dryomov's avatar
      crush: ensuring at most num-rep osds are selected · 3ca9f5f9
      Ilya Dryomov authored
      [ Upstream commit 45002267 ]
      
      Crush temporary buffers are allocated as per replica size configured
      by the user.  When there are more final osds (to be selected as per
      rule) than the replicas, buffer overlaps and it causes crash.  Now, it
      ensures that at most num-rep osds are selected even if more number of
      osds are allowed by the rule.
      
      Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                                234b066ba04976783d15ff2abc3e81b6cc06fb10.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3ca9f5f9
    • Nikolay Aleksandrov's avatar
      bridge: disable softirqs around br_fdb_update to avoid lockup · b824a7f0
      Nikolay Aleksandrov authored
      [ Upstream commit c4c832f8 ]
      
      br_fdb_update() can be called in process context in the following way:
      br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
      so we need to disable softirqs because there are softirq users of the
      hash_lock. One easy way to reproduce this is to modify the bridge utility
      to set NTF_USE, enable stp and then set maxageing to a low value so
      br_fdb_cleanup() is called frequently and then just add new entries in
      a loop. This happens because br_fdb_cleanup() is called from timer/softirq
      context. The spin locks in br_fdb_update were _bh before commit f8ae737d
      ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
      and at the time that commit was correct because br_fdb_update() couldn't be
      called from process context, but that changed after commit:
      292d1398 ("bridge: add NTF_USE support")
      Using local_bh_disable/enable around br_fdb_update() allows us to keep
      using the spin_lock/unlock in br_fdb_update for the fast-path.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: 292d1398 ("bridge: add NTF_USE support")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b824a7f0
    • Sriharsha Basavapatna's avatar
      be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent() · f938f18c
      Sriharsha Basavapatna authored
      [ Upstream commit e51000db ]
      
      There are several places in the driver (all in control paths) where
      coherent dma memory is being allocated using either dma_alloc_coherent()
      or the deprecated pci_alloc_consistent(). All these calls should be
      changed to use dma_zalloc_coherent() to avoid uninitialized fields in
      data structures backed by this memory.
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f938f18c
    • Shawn Bohrer's avatar
      ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() · eee3f329
      Shawn Bohrer authored
      [ Upstream commit 6e540309 ]
      
      421b3885 "udp: ipv4: Add udp early
      demux" introduced a regression that allowed sockets bound to INADDR_ANY
      to receive packets from multicast groups that the socket had not joined.
      For example a socket that had joined 224.168.2.9 could also receive
      packets from 225.168.2.9 despite not having joined that group if
      ip_early_demux is enabled.
      
      Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
      that the multicast packet is indeed ours.
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Reported-by: default avatarYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eee3f329
    • Ian Campbell's avatar
      xen: netback: read hotplug script once at start of day. · fe38ed61
      Ian Campbell authored
      [ Upstream commit 31a41898 ]
      
      When we come to tear things down in netback_remove() and generate the
      uevent it is possible that the xenstore directory has already been
      removed (details below).
      
      In such cases netback_uevent() won't be able to read the hotplug
      script and will write a xenstore error node.
      
      A recent change to the hypervisor exposed this race such that we now
      sometimes lose it (where apparently we didn't ever before).
      
      Instead read the hotplug script configuration during setup and use it
      for the lifetime of the backend device.
      
      The apparently more obvious fix of moving the transition to
      state=Closed in netback_remove() to after the uevent does not work
      because it is possible that we are already in state=Closed (in
      reaction to the guest having disconnected as it shutdown). Being
      already in Closed means the toolstack is at liberty to start tearing
      down the xenstore directories. In principal it might be possible to
      arrange to unregister the device sooner (e.g on transition to Closing)
      such that xenstore would still be there but this state machine is
      fragile and prone to anger...
      
      A modern Xen system only relies on the hotplug uevent for driver
      domains, when the backend is in the same domain as the toolstack it
      will run the necessary setup/teardown directly in the correct sequence
      wrt xenstore changes.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fe38ed61
    • Neal Cardwell's avatar
      tcp: fix child sockets to use system default congestion control if not set · 189debb5
      Neal Cardwell authored
      [ Upstream commit 9f950415 ]
      
      Linux 3.17 and earlier are explicitly engineered so that if the app
      doesn't specifically request a CC module on a listener before the SYN
      arrives, then the child gets the system default CC when the connection
      is established. See tcp_init_congestion_control() in 3.17 or earlier,
      which says "if no choice made yet assign the current value set as
      default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
      created") altered these semantics, so that children got their parent
      listener's congestion control even if the system default had changed
      after the listener was created.
      
      This commit returns to those original semantics from 3.17 and earlier,
      since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
      Congestion control initialization."), and some Linux congestion
      control workflows depend on that.
      
      In summary, if a listener socket specifically sets TCP_CONGESTION to
      "x", or the route locks the CC module to "x", then the child gets
      "x". Otherwise the child gets current system default from
      net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
      earlier, and this commit restores that.
      
      Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      189debb5
    • Eric Dumazet's avatar
      udp: fix behavior of wrong checksums · ee4ab7d8
      Eric Dumazet authored
      [ Upstream commit beb39db5 ]
      
      We have two problems in UDP stack related to bogus checksums :
      
      1) We return -EAGAIN to application even if receive queue is not empty.
         This breaks applications using edge trigger epoll()
      
      2) Under UDP flood, we can loop forever without yielding to other
         processes, potentially hanging the host, especially on non SMP.
      
      This patch is an attempt to make things better.
      
      We might in the future add extra support for rt applications
      wanting to better control time spent doing a recv() in a hostile
      environment. For example we could validate checksums before queuing
      packets in socket receive queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ee4ab7d8