1. 28 Jun, 2015 16 commits
  2. 22 Jun, 2015 1 commit
    • Mel Gorman's avatar
      sched, numa: Do not hint for NUMA balancing on VM_MIXEDMAP mappings · 90b934b1
      Mel Gorman authored
      commit 8e76d4ee upstream.
      
      Jovi Zhangwei reported the following problem
      
        Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
        with GFP_COMP flag.
      
        [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping:          (null) index:0x0
        [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
        [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
        [Mon May 25 05:29:33 2015] ------------[ cut here ]------------
        [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
        [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP
      
      In this case it was triggered by running tcpdump but it's not necessary
      reproducible on all systems.
      
        sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap
      
      Compound pages cannot be migrated and it was not expected that such pages
      be marked for NUMA balancing.  This did not take into account that drivers
      such as net/packet/af_packet.c may insert compound pages into userspace
      with vm_insert_page.  This patch tells the NUMA balancing protection
      scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
      compound pages are marked for migration.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [jovi: Backported to 3.18: adjust context]
      Signed-off-by: default avatarJovi Zhangwei <jovi@cloudflare.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      90b934b1
  3. 15 Jun, 2015 23 commits
    • Ilya Dryomov's avatar
      crush: ensuring at most num-rep osds are selected · 3ca9f5f9
      Ilya Dryomov authored
      [ Upstream commit 45002267 ]
      
      Crush temporary buffers are allocated as per replica size configured
      by the user.  When there are more final osds (to be selected as per
      rule) than the replicas, buffer overlaps and it causes crash.  Now, it
      ensures that at most num-rep osds are selected even if more number of
      osds are allowed by the rule.
      
      Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                                234b066ba04976783d15ff2abc3e81b6cc06fb10.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3ca9f5f9
    • Nikolay Aleksandrov's avatar
      bridge: disable softirqs around br_fdb_update to avoid lockup · b824a7f0
      Nikolay Aleksandrov authored
      [ Upstream commit c4c832f8 ]
      
      br_fdb_update() can be called in process context in the following way:
      br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
      so we need to disable softirqs because there are softirq users of the
      hash_lock. One easy way to reproduce this is to modify the bridge utility
      to set NTF_USE, enable stp and then set maxageing to a low value so
      br_fdb_cleanup() is called frequently and then just add new entries in
      a loop. This happens because br_fdb_cleanup() is called from timer/softirq
      context. The spin locks in br_fdb_update were _bh before commit f8ae737d
      ("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
      and at the time that commit was correct because br_fdb_update() couldn't be
      called from process context, but that changed after commit:
      292d1398 ("bridge: add NTF_USE support")
      Using local_bh_disable/enable around br_fdb_update() allows us to keep
      using the spin_lock/unlock in br_fdb_update for the fast-path.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: 292d1398 ("bridge: add NTF_USE support")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b824a7f0
    • Sriharsha Basavapatna's avatar
      be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent() · f938f18c
      Sriharsha Basavapatna authored
      [ Upstream commit e51000db ]
      
      There are several places in the driver (all in control paths) where
      coherent dma memory is being allocated using either dma_alloc_coherent()
      or the deprecated pci_alloc_consistent(). All these calls should be
      changed to use dma_zalloc_coherent() to avoid uninitialized fields in
      data structures backed by this memory.
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f938f18c
    • Shawn Bohrer's avatar
      ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() · eee3f329
      Shawn Bohrer authored
      [ Upstream commit 6e540309 ]
      
      421b3885 "udp: ipv4: Add udp early
      demux" introduced a regression that allowed sockets bound to INADDR_ANY
      to receive packets from multicast groups that the socket had not joined.
      For example a socket that had joined 224.168.2.9 could also receive
      packets from 225.168.2.9 despite not having joined that group if
      ip_early_demux is enabled.
      
      Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
      that the multicast packet is indeed ours.
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Reported-by: default avatarYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eee3f329
    • Ian Campbell's avatar
      xen: netback: read hotplug script once at start of day. · fe38ed61
      Ian Campbell authored
      [ Upstream commit 31a41898 ]
      
      When we come to tear things down in netback_remove() and generate the
      uevent it is possible that the xenstore directory has already been
      removed (details below).
      
      In such cases netback_uevent() won't be able to read the hotplug
      script and will write a xenstore error node.
      
      A recent change to the hypervisor exposed this race such that we now
      sometimes lose it (where apparently we didn't ever before).
      
      Instead read the hotplug script configuration during setup and use it
      for the lifetime of the backend device.
      
      The apparently more obvious fix of moving the transition to
      state=Closed in netback_remove() to after the uevent does not work
      because it is possible that we are already in state=Closed (in
      reaction to the guest having disconnected as it shutdown). Being
      already in Closed means the toolstack is at liberty to start tearing
      down the xenstore directories. In principal it might be possible to
      arrange to unregister the device sooner (e.g on transition to Closing)
      such that xenstore would still be there but this state machine is
      fragile and prone to anger...
      
      A modern Xen system only relies on the hotplug uevent for driver
      domains, when the backend is in the same domain as the toolstack it
      will run the necessary setup/teardown directly in the correct sequence
      wrt xenstore changes.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fe38ed61
    • Neal Cardwell's avatar
      tcp: fix child sockets to use system default congestion control if not set · 189debb5
      Neal Cardwell authored
      [ Upstream commit 9f950415 ]
      
      Linux 3.17 and earlier are explicitly engineered so that if the app
      doesn't specifically request a CC module on a listener before the SYN
      arrives, then the child gets the system default CC when the connection
      is established. See tcp_init_congestion_control() in 3.17 or earlier,
      which says "if no choice made yet assign the current value set as
      default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
      created") altered these semantics, so that children got their parent
      listener's congestion control even if the system default had changed
      after the listener was created.
      
      This commit returns to those original semantics from 3.17 and earlier,
      since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
      Congestion control initialization."), and some Linux congestion
      control workflows depend on that.
      
      In summary, if a listener socket specifically sets TCP_CONGESTION to
      "x", or the route locks the CC module to "x", then the child gets
      "x". Otherwise the child gets current system default from
      net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
      earlier, and this commit restores that.
      
      Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      189debb5
    • Eric Dumazet's avatar
      udp: fix behavior of wrong checksums · ee4ab7d8
      Eric Dumazet authored
      [ Upstream commit beb39db5 ]
      
      We have two problems in UDP stack related to bogus checksums :
      
      1) We return -EAGAIN to application even if receive queue is not empty.
         This breaks applications using edge trigger epoll()
      
      2) Under UDP flood, we can loop forever without yielding to other
         processes, potentially hanging the host, especially on non SMP.
      
      This patch is an attempt to make things better.
      
      We might in the future add extra support for rt applications
      wanting to better control time spent doing a recv() in a hostile
      environment. For example we could validate checksums before queuing
      packets in socket receive queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ee4ab7d8
    • Eric Dumazet's avatar
      bridge: fix br_multicast_query_expired() bug · f1394a1d
      Eric Dumazet authored
      [ Upstream commit 71d9f614 ]
      
      br_multicast_query_expired() querier argument is a pointer to
      a struct bridge_mcast_querier :
      
      struct bridge_mcast_querier {
              struct br_ip addr;
              struct net_bridge_port __rcu    *port;
      };
      
      Intent of the code was to clear port field, not the pointer to querier.
      
      Fixes: 2cd41431 ("bridge: memorize and export selected IGMP/MLD querier port")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Acked-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Cc: Linus Lüssing <linus.luessing@web.de>
      Cc: Steinar H. Gunderson <sesse@samfundet.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f1394a1d
    • Jason Gunthorpe's avatar
      sctp: Fix mangled IPv4 addresses on a IPv6 listening socket · a36102b2
      Jason Gunthorpe authored
      [ Upstream commit 9302d7bb ]
      
      sctp_v4_map_v6 was subtly writing and reading from members
      of a union in a way the clobbered data it needed to read before
      it read it.
      
      Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
      that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
      result.
      
      Reorder things to guarantee correct behaviour no matter what the
      union layout is.
      
      This impacts user space clients that open an IPv6 SCTP socket and
      receive IPv4 connections. Prior to 299ee user space would see a
      sockaddr with AF_INET and a correct address, after 299ee the sockaddr
      is AF_INET6, but the address is wrong.
      
      Fixes: 299ee123 (sctp: Fixup v4mapped behaviour to comply with Sock API)
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a36102b2
    • WANG Cong's avatar
      net_sched: invoke ->attach() after setting dev->qdisc · 4b72bd18
      WANG Cong authored
      [ Upstream commit 86e363dc ]
      
      For mq qdisc, we add per tx queue qdisc to root qdisc
      for display purpose, however, that happens too early,
      before the new dev->qdisc is finally set, this causes
      q->list points to an old root qdisc which is going to be
      freed right before assigning with a new one.
      
      Fix this by moving ->attach() after setting dev->qdisc.
      
      For the record, this fixes the following crash:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
       list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
       CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
        ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
        ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
       Call Trace:
        [<ffffffff81a44e7f>] dump_stack+0x4c/0x65
        [<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
        [<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
        [<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
        [<ffffffff814e725b>] __list_del_entry+0x5a/0x98
        [<ffffffff814e72a7>] list_del+0xe/0x2d
        [<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
        [<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
        [<ffffffff81822676>] qdisc_graft+0x11d/0x243
        [<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
        [<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
        [<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff818544b2>] netlink_unicast+0xcb/0x150
        [<ffffffff81161db9>] ? might_fault+0x59/0xa9
        [<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
        [<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
        [<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
        [<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
        [<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
        [<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
        [<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
        [<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
        [<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
        [<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff811b14d5>] ? __fget_light+0x50/0x78
        [<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
        [<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
       ---[ end trace ef29d3fb28e97ae7 ]---
      
      For long term, we probably need to clean up the qdisc_graft() code
      in case it hides other bugs like this.
      
      Fixes: 95dc1929 ("pkt_sched: give visibility to mq slave qdiscs")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4b72bd18
    • Ross Lagerwall's avatar
      xen/netback: Properly initialize credit_bytes · 688497a2
      Ross Lagerwall authored
      [ Upstream commit ce0e5c52 ]
      
      Commit e9ce7cb6 ("xen-netback: Factor queue-specific data into queue
      struct") introduced a regression when moving queue-specific data into
      the queue struct by failing to set the credit_bytes field. This
      prevented bandwidth limiting from working. Initialize the field as it
      was done before multiqueue support was added.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      688497a2
    • Mark Salyzyn's avatar
      unix/caif: sk_socket can disappear when state is unlocked · b991285c
      Mark Salyzyn authored
      [ Upstream commit b48732e4 ]
      
      got a rare NULL pointer dereference in clear_bit
      Signed-off-by: default avatarMark Salyzyn <salyzyn@android.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      ----
      v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
      v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b991285c
    • Richard Cochran's avatar
      net: dp83640: fix improper double spin locking. · 21efd84e
      Richard Cochran authored
      [ Upstream commit adbe088f ]
      
      A pair of nested spin locks was introduced in commit 63502b8d
      "dp83640: Fix receive timestamp race condition".
      
      Unfortunately the 'flags' parameter was reused for the inner lock,
      clobbering the originally saved IRQ state.  This patch fixes the issue
      by changing the inner lock to plain spin_lock without irqsave.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      21efd84e
    • Richard Cochran's avatar
      net: dp83640: reinforce locking rules. · adbea7ba
      Richard Cochran authored
      [ Upstream commit a935865c ]
      
      Callers of the ext_write function are supposed to hold a mutex that
      protects the state of the dialed page, but one caller was missing the
      lock from the very start, and over time the code has been changed
      without following the rule.  This patch cleans up the call sites in
      violation of the rule.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      adbea7ba
    • Richard Cochran's avatar
      net: dp83640: fix broken calibration routine. · 33f099e8
      Richard Cochran authored
      [ Upstream commit 397a253a ]
      
      Currently, the calibration function that corrects the initial offsets
      among multiple devices only works the first time.  If the function is
      called more than once, the calibration fails and bogus offsets will be
      programmed into the devices.
      
      In a well hidden spot, the device documentation tells that trigger indexes
      0 and 1 are special in allowing the TRIG_IF_LATE flag to actually work.
      
      This patch fixes the issue by using one of the special triggers during the
      recalibration method.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      33f099e8
    • Thadeu Lima de Souza Cascardo's avatar
      bridge: fix parsing of MLDv2 reports · b1c6cc17
      Thadeu Lima de Souza Cascardo authored
      [ Upstream commit 47cc84ce ]
      
      When more than a multicast address is present in a MLDv2 report, all but
      the first address is ignored, because the code breaks out of the loop if
      there has not been an error adding that address.
      
      This has caused failures when two guests connected through the bridge
      tried to communicate using IPv6. Neighbor discoveries would not be
      transmitted to the other guest when both used a link-local address and a
      static address.
      
      This only happens when there is a MLDv2 querier in the network.
      
      The fix will only break out of the loop when there is a failure adding a
      multicast address.
      
      The mdb before the patch:
      
      dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
      dev ovirtmgmt port bond0.86 grp ff02::2 temp
      
      After the patch:
      
      dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
      dev ovirtmgmt port bond0.86 grp ff02::fb temp
      dev ovirtmgmt port bond0.86 grp ff02::2 temp
      dev ovirtmgmt port bond0.86 grp ff02::d temp
      dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
      dev ovirtmgmt port bond0.86 grp ff02::16 temp
      dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
      dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
      dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
      
      Fixes: 08b202b6 ("bridge br_multicast: IPv6 MLD support.")
      Reported-by: default avatarRik Theys <Rik.Theys@esat.kuleuven.be>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Tested-by: default avatarRik Theys <Rik.Theys@esat.kuleuven.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b1c6cc17
    • Bjørn Mork's avatar
      cdc_ncm: Fix tx_bytes statistics · 842e6e8f
      Bjørn Mork authored
      [ Upstream commit 44f6731d ]
      
      The tx_curr_frame_payload field is u32. When we try to calculate a
      small negative delta based on it, we end up with a positive integer
      close to 2^32 instead.  So the tx_bytes pointer increases by about
      2^32 for every transmitted frame.
      
      Fix by calculating the delta as a signed long.
      
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Reported-by: default avatarFlorian Bruhin <me@the-compiler.org>
      Fixes: 7a1e890e ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      842e6e8f
    • Eric W. Biederman's avatar
      ipv4: Avoid crashing in ip_error · fd92b07c
      Eric W. Biederman authored
      [ Upstream commit 381c759d ]
      
      ip_error does not check if in_dev is NULL before dereferencing it.
      
      IThe following sequence of calls is possible:
      CPU A                          CPU B
      ip_rcv_finish
          ip_route_input_noref()
              ip_route_input_slow()
                                     inetdev_destroy()
          dst_input()
      
      With the result that a network device can be destroyed while processing
      an input packet.
      
      A crash was triggered with only unicast packets in flight, and
      forwarding enabled on the only network device.   The error condition
      was created by the removal of the network device.
      
      As such it is likely the that error code was -EHOSTUNREACH, and the
      action taken by ip_error (if in_dev had been accessible) would have
      been to not increment any counters and to have tried and likely failed
      to send an icmp error as the network device is going away.
      
      Therefore handle this weird case by just dropping the packet if
      !in_dev.  It will result in dropping the packet sooner, and will not
      result in an actual change of behavior.
      
      Fixes: 251da413 ("ipv4: Cache ip_error() routes even when not forwarding.")
      Reported-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Tested-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fd92b07c
    • Daniel Borkmann's avatar
      net: sched: fix call_rcu() race on classifier module unloads · f9a17e86
      Daniel Borkmann authored
      [ Upstream commit c78e1746 ]
      
      Vijay reported that a loop as simple as ...
      
        while true; do
          tc qdisc add dev foo root handle 1: prio
          tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
          tc qdisc del dev foo root
          rmmod cls_u32
        done
      
      ... will panic the kernel. Moreover, he bisected the change
      apparently introducing it to 78fd1d0a ("netlink: Re-add
      locking to netlink_lookup() and seq walker").
      
      The removal of synchronize_net() from the netlink socket
      triggering the qdisc to be removed, seems to have uncovered
      an RCU resp. module reference count race from the tc API.
      Given that RCU conversion was done after e341694e ("netlink:
      Convert netlink_lookup() to use RCU protected hash table")
      which added the synchronize_net() originally, occasion of
      hitting the bug was less likely (not impossible though):
      
      When qdiscs that i) support attaching classifiers and,
      ii) have at least one of them attached, get deleted, they
      invoke tcf_destroy_chain(), and thus call into ->destroy()
      handler from a classifier module.
      
      After RCU conversion, all classifier that have an internal
      prio list, unlink them and initiate freeing via call_rcu()
      deferral.
      
      Meanhile, tcf_destroy() releases already reference to the
      tp->ops->owner module before the queued RCU callback handler
      has been invoked.
      
      Subsequent rmmod on the classifier module is then not prevented
      since all module references are already dropped.
      
      By the time, the kernel invokes the RCU callback handler from
      the module, that function address is then invalid.
      
      One way to fix it would be to add an rcu_barrier() to
      unregister_tcf_proto_ops() to wait for all pending call_rcu()s
      to complete.
      
      synchronize_rcu() is not appropriate as under heavy RCU
      callback load, registered call_rcu()s could be deferred
      longer than a grace period. In case we don't have any pending
      call_rcu()s, the barrier is allowed to return immediately.
      
      Since we came here via unregister_tcf_proto_ops(), there
      are no users of a given classifier anymore. Further nested
      call_rcu()s pointing into the module space are not being
      done anywhere.
      
      Only cls_bpf_delete_prog() may schedule a work item, to
      unlock pages eventually, but that is not in the range/context
      of cls_bpf anymore.
      
      Fixes: 25d8c0d5 ("net: rcu-ify tcf_proto")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Reported-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f9a17e86
    • Henning Rogge's avatar
      net/ipv6/udp: Fix ipv6 multicast socket filter regression · 359aeb0a
      Henning Rogge authored
      [ Upstream commit 33b4b015 ]
      
      Commit <5cf3d461> ("udp: Simplify__udp*_lib_mcast_deliver")
      simplified the filter for incoming IPv6 multicast but removed
      the check of the local socket address and the UDP destination
      address.
      
      This patch restores the filter to prevent sockets bound to a IPv6
      multicast IP to receive other UDP traffic link unicast.
      Signed-off-by: default avatarHenning Rogge <hrogge@gmail.com>
      Fixes: 5cf3d461 ("udp: Simplify__udp*_lib_mcast_deliver")
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      359aeb0a
    • Florent Fourcot's avatar
      tcp/ipv6: fix flow label setting in TIME_WAIT state · dc3c6cb6
      Florent Fourcot authored
      [ Upstream commit 21858cd0 ]
      
      commit 1d13a96c ("ipv6: tcp: fix flowlabel value in ACK messages
      send from TIME_WAIT") added the flow label in the last TCP packets.
      Unfortunately, it was not casted properly.
      
      This patch replace the buggy shift with be32_to_cpu/cpu_to_be32.
      
      Fixes: 1d13a96c ("ipv6: tcp: fix flowlabel value in ACK messages")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dc3c6cb6
    • Nicolas Dichtel's avatar
      rtnl/bond: don't send rtnl msg for unregistered iface · 984ff7a3
      Nicolas Dichtel authored
      [ Upstream commit ed2a80ab ]
      
      Before the patch, the command 'ip link add bond2 type bond mode 802.3ad'
      causes the kernel to send a rtnl message for the bond2 interface, with an
      ifindex 0.
      
      'ip monitor' shows:
      0: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 state DOWN group default
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
      9: bond2@NONE: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default
          link/ether ea:3e:1f:53:92:7b brd ff:ff:ff:ff:ff:ff
      [snip]
      
      The patch fixes the spotted bug by checking in bond driver if the interface
      is registered before calling the notifier chain.
      It also adds a check in rtmsg_ifinfo() to prevent this kind of bug in the
      future.
      
      Fixes: d4261e56 ("bonding: create netlink event when bonding option is changed")
      CC: Jiri Pirko <jiri@resnulli.us>
      Reported-by: default avatarJulien Meunier <julien.meunier@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      984ff7a3
    • Florian Fainelli's avatar
      net: phy: Allow EEE for all RGMII variants · b23b97ca
      Florian Fainelli authored
      [ Upstream commit 7e140696 ]
      
      RGMII interfaces come in multiple flavors: RGMII with transmit or
      receive internal delay, no delays at all, or delays in both direction.
      
      This change extends the initial check for PHY_INTERFACE_MODE_RGMII to
      cover all of these variants since EEE should be allowed for any of these
      modes, since it is a property of the RGMII, hence Gigabit PHY capability
      more than the RGMII electrical interface and its delays.
      
      Fixes: a59a4d19 ("phy: add the EEE support and the way to access to the MMD registers")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b23b97ca