1. 19 Aug, 2016 5 commits
    • Eric Dumazet's avatar
      tcp: fix use after free in tcp_xmit_retransmit_queue() · bb1fceca
      Eric Dumazet authored
      When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
      tail of the write queue using tcp_add_write_queue_tail()
      
      Then it attempts to copy user data into this fresh skb.
      
      If the copy fails, we undo the work and remove the fresh skb.
      
      Unfortunately, this undo lacks the change done to tp->highest_sack and
      we can leave a dangling pointer (to a freed skb)
      
      Later, tcp_xmit_retransmit_queue() can dereference this pointer and
      access freed memory. For regular kernels where memory is not unmapped,
      this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
      returning garbage instead of tp->snd_nxt, but with various debug
      features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
      
      This bug was found by Marco Grassi thanks to syzkaller.
      
      Fixes: 6859d494 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb1fceca
    • Hariprasad Shenai's avatar
      cxgb4: Fixes resource allocation for ULD's in kdump kernel · e0d8b290
      Hariprasad Shenai authored
      At present the code to check in kdump kernel was not disabling
      allocation of resources when CONFIG_CHELSIO_T4_DCB is defined, move the
      code outside #defines so that it gets disabled irrespective of #define,
      when in kdump kernel.
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0d8b290
    • David Daney's avatar
      net: thunderx: Fix OOPs with ethtool --register-dump · 1423661f
      David Daney authored
      The ethtool_ops .get_regs function attempts to read the nonexistent
      register NIC_QSET_SQ_0_7_CNM_CHG, which produces a "bus error" type
      OOPs.
      
      Fix by not attempting to read, and removing the definition of,
      NIC_QSET_SQ_0_7_CNM_CHG.  A zero is written into the register dump to
      keep the layout unchanged.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1423661f
    • Yuval Mintz's avatar
      qede: Fix Tx timeout due to xmit_more · 039a3927
      Yuval Mintz authored
      Driver uses netif_tx_queue_stopped() to make sure the xmit_more
      indication will be honored, but that only checks for DRV_XOFF.
      
      At the same time, it's possible that during transmission the DQL will
      close the transmission queue with STACK_XOFF indication.
      In re-configuration flows, when the threshold is relatively low, it's
      possible that the device has no pending tranmissions, and during
      tranmission the driver would miss doorbelling the HW.
      Since there are no pending transmission, there will never be a Tx
      completion [and thus the DQL would not remove the STACK_XOFF indication],
      eventually causing the Tx queue to timeout.
      
      While we're at it - also doorbell in case driver has to close the
      transmission queue on its own [although this one is less important -
      if the ring is full, we're bound to receive completion eventually,
      which means the doorbell would only be postponed and not indefinetly
      blocked].
      
      Fixes: 312e0676 ("qede: Utilize xmit_more")
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      039a3927
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 53409afd
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter updates for your net tree,
      they are:
      
      1) Dump only conntrack that belong to this namespace via /proc file.
         This is some fallout from the conversion to single conntrack table
         for all netns, patch from Liping Zhang.
      
      2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents
         module autoloading, also from Liping Zhang.
      
      3) Report overquota event to the right netnamespace, again from Liping.
      
      4) Fix tproxy listener sk refcount that leads to crash, from
         Eric Dumazet.
      
      5) Fix racy refcounting on object deletion from nfnetlink and rule
         removal both for nfacct and cttimeout, from Liping Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53409afd
  2. 18 Aug, 2016 3 commits
    • Liping Zhang's avatar
      netfilter: cttimeout: fix use after free error when delete netns · b75911b6
      Liping Zhang authored
      In general, when we want to delete a netns, cttimeout_net_exit will
      be called before ipt_unregister_table, i.e. before ctnl_timeout_put.
      
      But after call kfree_rcu in cttimeout_net_exit, we will still decrease
      the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
      and will cause a use after free error.
      
      It is easy to reproduce this problem:
        # while : ; do
        ip netns add xxx
        ip netns exec xxx nfct add timeout testx inet icmp timeout 200
        ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
        ip netns del xxx
        done
      
        =======================================================================
        BUG kmalloc-96 (Tainted: G    B       E  ): Poison overwritten
        -----------------------------------------------------------------------
        INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
        0x6b
        INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        age=104 cpu=0 pid=3330
        ___slab_alloc+0x4da/0x540
        __slab_alloc+0x20/0x40
        __kmalloc+0x1c8/0x240
        cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
        [ ... ]
      
      So only when the refcnt decreased to 0, we call kfree_rcu to free the
      timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
      avoid race between ctnl_timeout_try_del and ctnl_timeout_put.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b75911b6
    • Liping Zhang's avatar
      netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy · 12be15dd
      Liping Zhang authored
      Suppose that we input the following commands at first:
        # nfacct add test
        # iptables -A INPUT -m nfacct --nfacct-name test
      
      And now "test" acct's refcnt is 2, but later when we try to delete the
      "test" nfacct and the related iptables rule at the same time, race maybe
      happen:
            CPU0                                    CPU1
        nfnl_acct_try_del                      nfnl_acct_put
        atomic_dec_and_test //ref=1,testfail          -
             -                                 atomic_dec_and_test //ref=0,testok
             -                                 kfree_rcu
        atomic_inc //ref=1                            -
      
      So after the rcu grace period, nf_acct will be freed but it is still linked
      in the nfnl_acct_list, and we can access it later, then oops will happen.
      
      Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
      operation atomic_cmpxchg here to fix this problem.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      12be15dd
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 184ca823
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
          Fietkau.
      
       2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
      
       3) Fix some tg3 ethtool logic bugs, and one that would cause no
          interrupts to be generated when rx-coalescing is set to 0.  From
          Satish Baddipadige and Siva Reddy Kallam.
      
       4) QLCNIC mailbox corruption and napi budget handling fix from Manish
          Chopra.
      
       5) Fix fib_trie logic when walking the trie during /proc/net/route
          output than can access a stale node pointer.  From David Forster.
      
       6) Several sctp_diag fixes from Phil Sutter.
      
       7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
      
       8) Checksum fixup fixes in bpf from Daniel Borkmann.
      
       9) Memork leaks in nfnetlink, from Liping Zhang.
      
      10) Use after free in rxrpc, from David Howells.
      
      11) Use after free in new skb_array code of macvtap driver, from Jason
          Wang.
      
      12) Calipso resource leak, from Colin Ian King.
      
      13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
      
      14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
      
      15) Fix lockdep splats in macsec, from Sabrina Dubroca.
      
      16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
          handling.
      
      17) Various tc-action bug fixes, from CONG Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
        net_sched: allow flushing tc police actions
        net_sched: unify the init logic for act_police
        net_sched: convert tcf_exts from list to pointer array
        net_sched: move tc offload macros to pkt_cls.h
        net_sched: fix a typo in tc_for_each_action()
        net_sched: remove an unnecessary list_del()
        net_sched: remove the leftover cleanup_a()
        mlxsw: spectrum: Allow packets to be trapped from any PG
        mlxsw: spectrum: Unmap 802.1Q FID before destroying it
        mlxsw: spectrum: Add missing rollbacks in error path
        mlxsw: reg: Fix missing op field fill-up
        mlxsw: spectrum: Trap loop-backed packets
        mlxsw: spectrum: Add missing packet traps
        mlxsw: spectrum: Mark port as active before registering it
        mlxsw: spectrum: Create PVID vPort before registering netdevice
        mlxsw: spectrum: Remove redundant errors from the code
        mlxsw: spectrum: Don't return upon error in removal path
        i40e: check for and deal with non-contiguous TCs
        ixgbe: Re-enable ability to toggle VLAN filtering
        ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
        ...
      184ca823
  3. 17 Aug, 2016 25 commits
  4. 16 Aug, 2016 7 commits