1. 01 Nov, 2017 13 commits
    • Eric Dumazet's avatar
      tcp: fix tcp_mtu_probe() vs highest_sack · 2b7cda9c
      Eric Dumazet authored
      Based on SNMP values provided by Roman, Yuchung made the observation
      that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
      
      Looking at tcp_mtu_probe(), I found that when a new skb was placed
      in front of the write queue, we were not updating tcp highest sack.
      
      If one skb is freed because all its content was copied to the new skb
      (for MTU probing), then tp->highest_sack could point to a now freed skb.
      
      Bad things would then happen, including infinite loops.
      
      This patch renames tcp_highest_sack_combine() and uses it
      from tcp_mtu_probe() to fix the bug.
      
      Note that I also removed one test against tp->sacked_out,
      since we want to replace tp->highest_sack regardless of whatever
      condition, since keeping a stale pointer to freed skb is a recipe
      for disaster.
      
      Fixes: a47e5a98 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reported-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b7cda9c
    • Eric Dumazet's avatar
      ipv6: addrconf: increment ifp refcount before ipv6_del_addr() · e669b869
      Eric Dumazet authored
      In the (unlikely) event fixup_permanent_addr() returns a failure,
      addrconf_permanent_addr() calls ipv6_del_addr() without the
      mandatory call to in6_ifa_hold(), leading to a refcount error,
      spotted by syzkaller :
      
      WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
      lib/refcount.c:227
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x41c kernel/panic.c:181
       __warn+0x1c4/0x1e0 kernel/panic.c:544
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
       do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
      RSP: 0018:ffff8801ca49e680 EFLAGS: 00010286
      RAX: 000000000000002c RBX: ffff8801d07cfcdc RCX: 0000000000000000
      RDX: 000000000000002c RSI: 1ffff10039493c90 RDI: ffffed0039493cc4
      RBP: ffff8801ca49e688 R08: ffff8801ca49dd70 R09: 0000000000000000
      R10: ffff8801ca49df58 R11: 0000000000000000 R12: 1ffff10039493cd9
      R13: ffff8801ca49e6e8 R14: ffff8801ca49e7e8 R15: ffff8801d07cfcdc
       __in6_ifa_put include/net/addrconf.h:369 [inline]
       ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
       addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
       addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
       call_netdevice_notifiers net/core/dev.c:1715 [inline]
       __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
       dev_change_flags+0xf5/0x140 net/core/dev.c:6879
       do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
       rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
       rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
       netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
       netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
       __sys_sendmsg+0xe5/0x210 net/socket.c:2083
       SYSC_sendmsg net/socket.c:2094 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2090
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x7fa9174d3320
      RSP: 002b:00007ffe302ae9e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007ffe302b2ae0 RCX: 00007fa9174d3320
      RDX: 0000000000000000 RSI: 00007ffe302aea20 RDI: 0000000000000016
      RBP: 0000000000000082 R08: 0000000000000000 R09: 000000000000000f
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe302b32a0
      R13: 0000000000000000 R14: 00007ffe302b2ab8 R15: 00007ffe302b32b8
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e669b869
    • Craig Gallek's avatar
      tun/tap: sanitize TUNSETSNDBUF input · 93161922
      Craig Gallek authored
      Syzkaller found several variants of the lockup below by setting negative
      values with the TUNSETSNDBUF ioctl.  This patch adds a sanity check
      to both the tun and tap versions of this ioctl.
      
        watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
        Modules linked in:
        irq event stamp: 329692056
        hardirqs last  enabled at (329692055): [<ffffffff824b8381>] _raw_spin_unlock_irqrestore+0x31/0x75
        hardirqs last disabled at (329692056): [<ffffffff824b9e58>] apic_timer_interrupt+0x98/0xb0
        softirqs last  enabled at (35659740): [<ffffffff824bc958>] __do_softirq+0x328/0x48c
        softirqs last disabled at (35659731): [<ffffffff811c796c>] irq_exit+0xbc/0xd0
        CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        task: ffff880009452140 task.stack: ffff880006a20000
        RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80
        RSP: 0018:ffff880006a27c50 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
        RAX: ffff880009ac68d0 RBX: ffff880006a27ce0 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff880006a27ce0 RDI: ffff880009ac6900
        RBP: ffff880006a27c60 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000001 R11: 000000000063ff00 R12: ffff880009ac6900
        R13: ffff880006a27cf8 R14: 0000000000000001 R15: ffff880006a27cf8
        FS:  00007f4be4838700(0000) GS:ffff88000cc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000020101000 CR3: 0000000009616000 CR4: 00000000000006f0
        Call Trace:
         prepare_to_wait+0x26/0xc0
         sock_alloc_send_pskb+0x14e/0x270
         ? remove_wait_queue+0x60/0x60
         tun_get_user+0x2cc/0x19d0
         ? __tun_get+0x60/0x1b0
         tun_chr_write_iter+0x57/0x86
         __vfs_write+0x156/0x1e0
         vfs_write+0xf7/0x230
         SyS_write+0x57/0xd0
         entry_SYSCALL_64_fastpath+0x1f/0xbe
        RIP: 0033:0x7f4be4356df9
        RSP: 002b:00007ffc18101c08 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4be4356df9
        RDX: 0000000000000046 RSI: 0000000020101000 RDI: 0000000000000005
        RBP: 00007ffc18101c40 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000001 R11: 0000000000000293 R12: 0000559c75f64780
        R13: 00007ffc18101d30 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: 33dccbb0 ("tun: Limit amount of queued packets per device")
      Fixes: 20d29d7a ("net: macvtap driver")
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93161922
    • Vadim Pasternak's avatar
      mlxsw: i2c: Fix buffer increment counter for write transaction · d70eaa38
      Vadim Pasternak authored
      It fixes a problem for the last chunk where 'chunk_size' is smaller than
      MLXSW_I2C_BLK_MAX and data is copied to the wrong offset, overriding
      previous data.
      
      Fixes: 6882b0ae ("mlxsw: Introduce support for I2C bus")
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d70eaa38
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 122f00cd
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-11-01
      
      1) Fix a memleak when a packet matches a policy
         without a matching state.
      
      2) Reset the socket cached dst_entry when inserting
         a socket policy, otherwise the policy might be
         ignored. From Jonathan Basseri.
      
      3) Fix GSO for a IPsec, GRE tunnel combination.
         We reset the encapsulation field at the skb
         too erly, as a result GRE does not segment
         GSO packets. Fix this by resetting the the
         encapsulation field right before the
         transformation where the inner headers get
         invalid.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      122f00cd
    • Ido Schimmel's avatar
      mlxsw: reg: Add high and low temperature thresholds · 62b0e924
      Ido Schimmel authored
      The ASIC has the ability to generate events whenever a sensor indicates
      the temperature goes above or below its high or low thresholds,
      respectively.
      
      In new firmware versions the firmware enforces a minimum of 5
      degrees Celsius difference between both thresholds. Make the driver
      conform to this requirement.
      
      Note that this is required even when the events are disabled, as in
      certain systems interrupts are generated via GPIO based on these
      thresholds.
      
      Fixes: 85926f87 ("mlxsw: reg: Add definition of temperature management registers")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62b0e924
    • Yuval Mintz's avatar
      MAINTAINERS: Remove Yotam from mlxfw · 1cf098b7
      Yuval Mintz authored
      Provide a mailing list for maintenance of the module instead.
      Signed-off-by: default avatarYuval Mintz <yuvalm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cf098b7
    • Yotam Gigi's avatar
      MAINTAINERS: Update Yotam's E-mail · f1fd20c3
      Yotam Gigi authored
      For the time being I will be available in my private mail. Update both the
      MAINTAINERS file and the individual modules MODULE_AUTHOR directive with
      the new address.
      Signed-off-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Signed-off-by: default avatarYuval Mintz <yuvalm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1fd20c3
    • Pan Bian's avatar
      net: hns: set correct return value · d2083d0e
      Pan Bian authored
      The function of_parse_phandle() returns a NULL pointer if it cannot
      resolve a phandle property to a device_node pointer. In function
      hns_nic_dev_probe(), its return value is passed to PTR_ERR to extract
      the error code. However, in this case, the extracted error code will
      always be zero, which is unexpected.
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Reviewed-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2083d0e
    • Pan Bian's avatar
      net: lapbether: fix double free · 7db8874a
      Pan Bian authored
      The function netdev_priv() returns the private data of the device. The
      memory to store the private data is allocated in alloc_netdev() and is
      released in netdev_free(). Calling kfree() on the return value of
      netdev_priv() after netdev_free() results in a double free bug.
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db8874a
    • John Fastabend's avatar
      bpf: remove SK_REDIRECT from UAPI · 04686ef2
      John Fastabend authored
      Now that SK_REDIRECT is no longer a valid return code. Remove it
      from the UAPI completely. Then do a namespace remapping internal
      to sockmap so SK_REDIRECT is no longer externally visible.
      
      Patchs primary change is to do a namechange from SK_REDIRECT to
      __SK_REDIRECT
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04686ef2
    • Andrew Lunn's avatar
      net: phy: marvell: Only configure RGMII delays when using RGMII · 14fc0aba
      Andrew Lunn authored
      The fix 5987feb3 ("net: phy: marvell: logical vs bitwise OR typo")
      uncovered another bug in the Marvell PHY driver, which broke the
      Marvell OpenRD platform. It relies on the bootloader configuring the
      RGMII delays and does not specify a phy-mode in its device tree.  The
      PHY driver should only configure RGMII delays if the phy mode
      indicates it is using RGMII. Without anything in device tree, the
      mv643xx Ethernet driver defaults to GMII.
      
      Fixes: 5987feb3 ("net: phy: marvell: logical vs bitwise OR typo")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14fc0aba
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-for-davem-2017-10-31' of... · b34a264f
      David S. Miller authored
      Merge tag 'wireless-drivers-for-davem-2017-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.14
      
      The most important here is the security vulnerabitility fix for
      ath10k.
      
      ath10k
      
      * fix security vulnerability with missing PN check on certain hardware
      
      * revert ath10k napi fix as it caused regressions on QCA6174
      
      wcn36xx
      
      * remove unnecessary rcu_read_unlock() from error path
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b34a264f
  2. 31 Oct, 2017 5 commits
  3. 30 Oct, 2017 2 commits
    • Kalle Valo's avatar
      Revert "ath10k: fix napi_poll budget overflow" · e48e9c42
      Kalle Valo authored
      Thorsten reported on <fa6e3ee2-91b5-a54b-afe3-87f30aac7a48@leemhuis.info> that
      commit c9353bf4 made ath10k unstable with QCA6174 on his Dell XPS13 (9360)
      with an error message:
      
      ath10k_pci 0000:3a:00.0: failed to extract amsdu: -11
      
      It only seemed to happen with certain APs, not all, but when it happened the
      only way to get ath10k working was to switch the wifi off and on with a hotkey.
      
      As this commit made things even worse (a warning vs breaking the whole
      connection) let's revert the commit for now and while the issue is being fixed.
      
      Link: http://lists.infradead.org/pipermail/ath10k/2017-October/010227.htmlReported-by: default avatarThorsten Leemhuis <linux@leemhuis.info>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      e48e9c42
    • Vasanthakumar Thiagarajan's avatar
      ath10k: rebuild crypto header in rx data frames · 7eccb738
      Vasanthakumar Thiagarajan authored
      Rx data frames notified through HTT_T2H_MSG_TYPE_RX_IND and
      HTT_T2H_MSG_TYPE_RX_FRAG_IND expect PN/TSC check to be done
      on host (mac80211) rather than firmware. Rebuild cipher header
      in every received data frames (that are notified through those
      HTT interfaces) from the rx_hdr_status tlv available in the
      rx descriptor of the first msdu. Skip setting RX_FLAG_IV_STRIPPED
      flag for the packets which requires mac80211 PN/TSC check support
      and set appropriate RX_FLAG for stripped crypto tail. Hw QCA988X,
      QCA9887, QCA99X0, QCA9984, QCA9888 and QCA4019 currently need the
      rebuilding of cipher header to perform PN/TSC check for replay
      attack.
      
      Please note that removing crypto tail for CCMP-256, GCMP and GCMP-256 ciphers
      in raw mode needs to be fixed. Since Rx with these ciphers in raw
      mode does not work in the current form even without this patch and
      removing crypto tail for these chipers needs clean up, raw mode related
      issues in CCMP-256, GCMP and GCMP-256 can be addressed in follow up
      patches.
      Tested-by: default avatarManikanta Pubbisetty <mpubbise@qti.qualcomm.com>
      Signed-off-by: default avatarVasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      7eccb738
  4. 29 Oct, 2017 20 commits
    • Linus Torvalds's avatar
      Linux 4.14-rc7 · 0b07194b
      Linus Torvalds authored
      0b07194b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 19e12196
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix route leak in xfrm_bundle_create().
      
       2) In mac80211, validate user rate mask before configuring it. From
          Johannes Berg.
      
       3) Properly enforce memory limits in fair queueing code, from Toke
          Hoiland-Jorgensen.
      
       4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
      
       5) Fix TSO header allocation and management in mvpp2 driver, from Yan
          Markman.
      
       6) Don't take socket lock in BH handler in strparser code, from Tom
          Herbert.
      
       7) Don't show sockets from other namespaces in AF_UNIX code, from
          Andrei Vagin.
      
       8) Fix double free in error path of tap_open(), from Girish Moodalbail.
      
       9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
          and Alexander Duyck.
      
      10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
      
      11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
          Long.
      
      12) Properly align SKB head before building SKB in tuntap, from Jason
          Wang.
      
      13) Avoid matching qdiscs with a zero handle during lookups, from Cong
          Wang.
      
      14) Fix various endianness bugs in sctp, from Xin Long.
      
      15) Fix tc filter callback races and add selftests which trigger the
          problem, from Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
        selftests: Introduce a new test case to tc testsuite
        selftests: Introduce a new script to generate tc batch file
        net_sched: fix call_rcu() race on act_sample module removal
        net_sched: add rtnl assertion to tcf_exts_destroy()
        net_sched: use tcf_queue_work() in tcindex filter
        net_sched: use tcf_queue_work() in rsvp filter
        net_sched: use tcf_queue_work() in route filter
        net_sched: use tcf_queue_work() in u32 filter
        net_sched: use tcf_queue_work() in matchall filter
        net_sched: use tcf_queue_work() in fw filter
        net_sched: use tcf_queue_work() in flower filter
        net_sched: use tcf_queue_work() in flow filter
        net_sched: use tcf_queue_work() in cgroup filter
        net_sched: use tcf_queue_work() in bpf filter
        net_sched: use tcf_queue_work() in basic filter
        net_sched: introduce a workqueue for RCU callbacks of tc filter
        sctp: fix some type cast warnings introduced since very beginning
        sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
        sctp: fix some type cast warnings introduced by transport rhashtable
        sctp: fix some type cast warnings introduced by stream reconf
        ...
      19e12196
    • David S. Miller's avatar
      Merge branch 'net_sched-fix-races-with-RCU-callbacks' · 6c325f4e
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: fix races with RCU callbacks
      
      Recently, the RCU callbacks used in TC filters and TC actions keep
      drawing my attention, they introduce at least 4 race condition bugs:
      
      1. A simple one fixed by Daniel:
      
      commit c78e1746
      Author: Daniel Borkmann <daniel@iogearbox.net>
      Date:   Wed May 20 17:13:33 2015 +0200
      
          net: sched: fix call_rcu() race on classifier module unloads
      
      2. A very nasty one fixed by me:
      
      commit 1697c4bb
      Author: Cong Wang <xiyou.wangcong@gmail.com>
      Date:   Mon Sep 11 16:33:32 2017 -0700
      
          net_sched: carefully handle tcf_block_put()
      
      3. Two more bugs found by Chris:
      https://patchwork.ozlabs.org/patch/826696/
      https://patchwork.ozlabs.org/patch/826695/
      
      Usually RCU callbacks are simple, however for TC filters and actions,
      they are complex because at least TC actions could be destroyed
      together with the TC filter in one callback. And RCU callbacks are
      invoked in BH context, without locking they are parallel too. All of
      these contribute to the cause of these nasty bugs.
      
      Alternatively, we could also:
      
      a) Introduce a spinlock to serialize these RCU callbacks. But as I
      said in commit 1697c4bb ("net_sched: carefully handle
      tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
      Potentially we need to do a lot of work to make it possible (if not
      impossible).
      
      b) Just get rid of these RCU callbacks, because they are not
      necessary at all, callers of these call_rcu() are all on slow paths
      and holding RTNL lock, so blocking is allowed in their contexts.
      However, David and Eric dislike adding synchronize_rcu() here.
      
      As suggested by Paul, we could defer the work to a workqueue and
      gain the permission of holding RTNL again without any performance
      impact, however, in tcf_block_put() we could have a deadlock when
      flushing workqueue while hodling RTNL lock, the trick here is to
      defer the work itself in workqueue and make it queued after all
      other works so that we keep the same ordering to avoid any
      use-after-free. Please see the first patch for details.
      
      Patch 1 introduces the infrastructure, patch 2~12 move each
      tc filter to the new tc filter workqueue, patch 13 adds
      an assertion to catch potential bugs like this, patch 14
      closes another rcu callback race, patch 15 and patch 16 add
      new test cases.
      ====================
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c325f4e
    • Chris Mi's avatar
      selftests: Introduce a new test case to tc testsuite · 31c2611b
      Chris Mi authored
      In this patchset, we fixed a tc bug. This patch adds the test case
      that reproduces the bug. To run this test case, user should specify
      an existing NIC device:
        # sudo ./tdc.py -d enp4s0f0
      
      This test case belongs to category "flower". If user doesn't specify
      a NIC device, the test cases belong to "flower" will not be run.
      
      In this test case, we create 1M filters and all filters share the same
      action. When destroying all filters, kernel should not panic. It takes
      about 18s to run it.
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31c2611b
    • Chris Mi's avatar
      selftests: Introduce a new script to generate tc batch file · 7f071998
      Chris Mi authored
        # ./tdc_batch.py -h
        usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file
      
        TC batch file generator
      
        positional arguments:
          device                device name
          file                  batch file name
      
        optional arguments:
          -h, --help            show this help message and exit
          -n NUMBER, --number NUMBER
                                how many lines in batch file
          -o, --skip_sw         skip_sw (offload), by default skip_hw
          -s, --share_action    all filters share the same action
          -p, --prio            all filters have different prio
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f071998
    • Cong Wang's avatar
      net_sched: fix call_rcu() race on act_sample module removal · 46e235c1
      Cong Wang authored
      Similar to commit c78e1746
      ("net: sched: fix call_rcu() race on classifier module unloads"),
      we need to wait for flying RCU callback tcf_sample_cleanup_rcu().
      
      Cc: Yotam Gigi <yotamg@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46e235c1
    • Cong Wang's avatar
      net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba
      Cong Wang authored
      After previous patches, it is now safe to claim that
      tcf_exts_destroy() is always called with RTNL lock.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d132eba
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in tcindex filter · 27ce4f05
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27ce4f05
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in rsvp filter · d4f84a41
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4f84a41
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in route filter · c2f3f31d
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f3f31d
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in u32 filter · c0d378ef
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0d378ef
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in matchall filter · df2735ee
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df2735ee
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in fw filter · e071dff2
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e071dff2
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flower filter · 0552c8af
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0552c8af
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flow filter · 94cdb475
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94cdb475
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in cgroup filter · b1b5b04f
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1b5b04f
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in bpf filter · e910af67
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e910af67
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in basic filter · c96a4838
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c96a4838
    • Cong Wang's avatar
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang authored
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aa0045d
    • David S. Miller's avatar
      Merge branch 'sctp-endianness-fixes' · 8c83c885
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for some sparse warnings
      
      As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
      warnings or errors checked by sparse appear. They are all problems
      about Endian and type cast.
      
      Most of them are just warnings by which no issues could be caused
      while some might be bugs.
      
      This patchset fixes them with four patches basically according to
      how they are introduced.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c83c885