1. 06 Jun, 2018 3 commits
  2. 05 Jun, 2018 37 commits
    • Cong Wang's avatar
      netdev-FAQ: clarify DaveM's position for stable backports · 75d4e704
      Cong Wang authored
      Per discussion with David at netconf 2018, let's clarify
      DaveM's position of handling stable backports in netdev-FAQ.
      
      This is important for people relying on upstream -stable
      releases.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d4e704
    • Eric Dumazet's avatar
      rtnetlink: validate attributes in do_setlink() · 644c7eeb
      Eric Dumazet authored
      It seems that rtnl_group_changelink() can call do_setlink
      while a prior call to validate_linkmsg(dev = NULL, ...) could
      not validate IFLA_ADDRESS / IFLA_BROADCAST
      
      Make sure do_setlink() calls validate_linkmsg() instead
      of letting its callers having this responsibility.
      
      With help from Dmitry Vyukov, thanks a lot !
      
      BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
      BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
      BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
      CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
       __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
       is_valid_ether_addr include/linux/etherdevice.h:199 [inline]
       eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline]
       eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308
       dev_set_mac_address+0x261/0x530 net/core/dev.c:7157
       do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317
       rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
       rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x455a09
      RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09
      RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
       kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527
       __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478
       do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315
       rtnl_group_changelink net/core/rtnetlink.c:2824 [inline]
       rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
       kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2753 [inline]
       __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:988 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
       netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e7ed828f ("netlink: support setting devgroup parameters")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      644c7eeb
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · fd129f89
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-06-05
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
         connect: "This allows to override source IP (including the case when it's
         set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
         TCP and connected UDP (fast path) are not affected. This makes UDP support
         complete, that is, connected UDP is handled by connect hooks, unconnected
         by sendmsg ones.", from Andrey.
      
      2) Rework of the AF_XDP API to allow extending it in future for type writer
         model if necessary. In this mode a memory window is passed to hardware
         and multiple frames might be filled into that window instead of just one
         that is the case in the current fixed frame-size model. With the new
         changes made this can be supported without having to add a new descriptor
         format. Also, core bits for the zero-copy support for AF_XDP have been
         merged as agreed upon, where i40e bits will be routed via Jeff later on.
         Various improvements to documentation and sample programs included as
         well, all from Björn and Magnus.
      
      3) Given BPF's flexibility, a new program type has been added to implement
         infrared decoders. Quote: "The kernel IR decoders support the most
         widely used IR protocols, but there are many protocols which are not
         supported. [...] There is a 'long tail' of unsupported IR protocols,
         for which lircd is need to decode the IR. IR encoding is done in such
         a way that some simple circuit can decode it; therefore, BPF is ideal.
         [...] user-space can define a decoder in BPF, attach it to the rc
         device through the lirc chardev.", from Sean.
      
      4) Several improvements and fixes to BPF core, among others, dumping map
         and prog IDs into fdinfo which is a straight forward way to correlate
         BPF objects used by applications, removing an indirect call and therefore
         retpoline in all map lookup/update/delete calls by invoking the callback
         directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
         for tc BPF programs to have an efficient way of looking up cgroup v2 id
         for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
         state that hasn't been filled, to allow context access wrt pt_regs in
         32 bit archs for tracing, and last but not least various test cases
         for fixes that landed in bpf earlier, from Daniel.
      
      5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
         a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
         call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.
      
      6) Add a new bpf_get_current_cgroup_id() helper that can be used in
         tracing to retrieve the cgroup id from the current process in order
         to allow for e.g. aggregation of container-level events, from Yonghong.
      
      7) Two follow-up fixes for BTF to reject invalid input values and
         related to that also two test cases for BPF kselftests, from Martin.
      
      8) Various API improvements to the bpf_fib_lookup() helper, that is,
         dropping MPLS bits which are not fully hashed out yet, rejecting
         invalid helper flags, returning error for unsupported address
         families as well as renaming flowlabel to flowinfo, from David.
      
      9) Various fixes and improvements to sockmap BPF kselftests in particular
         in proper error detection and data verification, from Prashant.
      
      10) Two arm32 BPF JIT improvements. One is to fix imm range check with
          regards to whether immediate fits into 24 bits, and a naming cleanup
          to get functions related to rsh handling consistent to those handling
          lsh, from Wang.
      
      11) Two compile warning fixes in BPF, one for BTF and a false positive
          to silent gcc in stack_map_get_build_id_offset(), from Arnd.
      
      12) Add missing seg6.h header into tools include infrastructure in order
          to fix compilation of BPF kselftests, from Mathieu.
      
      13) Several formatting cleanups in the BPF UAPI helper description that
          also fix an error during rst2man compilation, from Quentin.
      
      14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
          not built into the kernel, from Yue.
      
      15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd129f89
    • David S. Miller's avatar
      Merge branch 'devlink-extack' · a6fa9087
      David S. Miller authored
      David Ahern says:
      
      ====================
      devlink: Add extack messages for reload and port split/unsplit
      
      Patch 1 adds extack arg to reload, port_split and port_unsplit devlink
      operations.
      
      Patch 2 adds extack messages for reload operation in netdevsim.
      
      Patch 3 adds extack messages to port split/unsplit in mlxsw driver.
      
      v2
      - make the extack messages align with existing dev_err
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6fa9087
    • David Ahern's avatar
      mlxsw: Add extack messages for port_{un, }split failures · 3fcc773b
      David Ahern authored
      Return messages in extack for port split/unsplit errors. e.g.,
          $ devlink port split swp1s1 count 4
          Error: mlxsw_spectrum: Port cannot be split further.
          devlink answers: Invalid argument
      
          $ devlink port unsplit swp4
          Error: mlxsw_spectrum: Port was not split.
          devlink answers: Invalid argument
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fcc773b
    • David Ahern's avatar
      netdevsim: Add extack error message for devlink reload · 7fa76d77
      David Ahern authored
      devlink reset command can fail if a FIB resource limit is set to a value
      lower than the current occupancy. Return a proper message indicating the
      reason for the failure.
      
      $ devlink resource sh netdevsim/netdevsim0
      netdevsim/netdevsim0:
        name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
          resources:
            name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
            name fib-rules size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
        name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
          resources:
            name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
            name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
      
      $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40
      
      $ devlink dev  reload netdevsim/netdevsim0
      Error: netdevsim: New size is less than current occupancy.
      devlink answers: Invalid argument
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fa76d77
    • David Ahern's avatar
      devlink: Add extack to reload and port_{un, }split operations · ac0fc8a1
      David Ahern authored
      Add extack argument to reload, port_split and port_unsplit operations.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac0fc8a1
    • Eric Dumazet's avatar
      net: metrics: add proper netlink validation · 5b5e7a0d
      Eric Dumazet authored
      Before using nla_get_u32(), better make sure the attribute
      is of the proper size.
      
      Code recently was changed, but bug has been there from beginning
      of git.
      
      BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746
      CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
       __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
       rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746
       fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361
       rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419
       fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287
       inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x455a09
      RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09
      RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529
       fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline]
       fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150
       fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146
       inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
       rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
       netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
       kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2753 [inline]
       __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:988 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
       netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg net/socket.c:639 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
       __sys_sendmsg net/socket.c:2155 [inline]
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
       do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a919525a ("net: Move fib_convert_metrics to metrics file")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b5e7a0d
    • Sabrina Dubroca's avatar
      ipmr: fix error path when ipmr_new_table fails · e783bb00
      Sabrina Dubroca authored
      commit 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
      refactored ipmr_new_table, so that it now returns NULL when
      mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table
      expect an ERR_PTR.
      
      This can result in NULL deref, for example when ipmr_rules_exit calls
      ipmr_free_table with NULL net->ipv4.mrt in the
      !CONFIG_IP_MROUTE_MULTIPLE_TABLES version.
      
      This patch makes mr_table_alloc return errors, and changes
      ip6mr_new_table and its callers to return/expect error pointers as
      well. It also removes the version of mr_table_alloc defined under
      !CONFIG_IP_MROUTE_COMMON, since it is never used.
      
      Fixes: 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e783bb00
    • Sabrina Dubroca's avatar
      ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds · 848235ed
      Sabrina Dubroca authored
      Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during
      ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same
      setsockopt will fail with -ENOENT, since we haven't actually created
      that table.
      
      A similar fix for ipv4 was included in commit 5e1859fb ("ipv4: ipmr:
      various fixes and cleanups").
      
      Fixes: d1db275d ("ipv6: ip6mr: support multiple tables")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      848235ed
    • Arnd Bergmann's avatar
      net: hns3: remove unused hclgevf_cfg_func_mta_filter · 4f416db9
      Arnd Bergmann authored
      The last patch apparently added a complete replacement for this
      function, but left the old one in place, which now causes a
      harmless warning:
      
      drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c:731:12: 'hclgevf_cfg_func_mta_filter' defined but not used
      
      I assume it can be removed.
      
      Fixes: 3a678b58 ("net: hns3: Optimize the VF's process of updating multicast MAC")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f416db9
    • Arnd Bergmann's avatar
      netfilter: provide udp*_lib_lookup for nf_tproxy · 6e86000c
      Arnd Bergmann authored
      It is now possible to enable the libified nf_tproxy modules without
      also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the
      ifdef logic in the udp core code:
      
      net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6':
      nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup'
      net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4':
      nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup'
      
      We can actually simplify the conditions now to provide the two functions
      exactly when they are needed.
      
      Fixes: 45ca4e0c ("netfilter: Libify xt_TPROXY")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e86000c
    • Michal Kalderon's avatar
      qed*: Utilize FW 8.37.2.0 · d52c89f1
      Michal Kalderon authored
      This FW contains several fixes and features.
      
      RDMA
      - Several modifications and fixes for Memory Windows
      - drop vlan and tcp timestamp from mss calculation in driver for
        this FW
      - Fix SQ completion flow when local ack timeout is infinite
      - Modifications in t10dif support
      
      ETH
      - Fix aRFS for tunneled traffic without inner IP.
      - Fix chip configuration which may fail under heavy traffic conditions.
      - Support receiving any-VNI in VXLAN and GENEVE RX classification.
      
      iSCSI / FcoE
      - Fix iSCSI recovery flow
      - Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0
      
      Misc
      - Several registers (split registers) won't read correctly with
        ethtool -d
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: default avatarManish Rangankar <manish.rangankar@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52c89f1
    • Maciej Żenczykowski's avatar
      net-tcp: remove useless tw_timeout field · 95358a95
      Maciej Żenczykowski authored
      Tested: 'git grep tw_timeout' comes up empty and it builds :-)
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95358a95
    • Paul Blakey's avatar
      net: sched: cls: Fix offloading when ingress dev is vxlan · d96a43c6
      Paul Blakey authored
      When using a vxlan device as the ingress dev, we count it as a
      "no offload dev", so when such a rule comes and err stop is true,
      we fail early and don't try the egdev route which can offload it
      through the egress device.
      
      Fix that by not calling the block offload if one of the devices
      attached to it is not offload capable, but make sure egress on such case
      is capable instead.
      
      Fixes: caa72601 ("net: sched: keep track of offloaded filters [..]")
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d96a43c6
    • Xin Long's avatar
      sctp: not allow transport timeout value less than HZ/5 for hb_timer · 1d88ba1e
      Xin Long authored
      syzbot reported a rcu_sched self-detected stall on CPU which is caused
      by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
      value, hb_timer will get stuck there, as in its timer handler it starts
      this timer again with this value, then goes to the timer handler again.
      
      This problem is there since very beginning, and thanks to Eric for the
      reproducer shared from a syzbot mail.
      
      This patch fixes it by not allowing sctp_transport_timeout to return a
      smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.
      
      Note that it doesn't fix this issue by limiting rto_min, as some users
      are still using small rto and no proper value was found for it yet.
      
      Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
      Suggested-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d88ba1e
    • Alexei Starovoitov's avatar
      bpfilter: switch to CC from HOSTCC · 819dd92b
      Alexei Starovoitov authored
      check that CC can build executables and use that compiler instead of HOSTCC
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      819dd92b
    • Wei Yongjun's avatar
      net/mlx5e: fix error return code in mlx5e_alloc_rq() · 47a6ca3f
      Wei Yongjun authored
      Fix to return error code -ENOMEM from the kvzalloc_node() error handling
      case instead of 0, as done elsewhere in this function.
      
      Fixes: 069d1146 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47a6ca3f
    • Wei Yongjun's avatar
      net/mlx5e: Make function mlx5e_change_rep_mtu() static · 6f6027a5
      Wei Yongjun authored
      Fixes the following sparse warning:
      
      drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
       symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f6027a5
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix use after free while sending command ack · 3602207c
      Subash Abhinov Kasiviswanathan authored
      When sending an ack to a command packet, the skb is still referenced
      after it is sent to the real device. Since the real device could
      free the skb, the device pointer would be invalid.
      Also, remove an unnecessary variable.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3602207c
    • Subash Abhinov Kasiviswanathan's avatar
      net: ipv6: Generate random IID for addresses on RAWIP devices · 9deb441c
      Subash Abhinov Kasiviswanathan authored
      RAWIP devices such as rmnet do not have a hardware address and
      instead require the kernel to generate a random IID for the
      IPv6 addresses.
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9deb441c
    • Yousuk Seung's avatar
      tcp: refactor tcp_ecn_check_ce to remove sk type cast · f4c9f85f
      Yousuk Seung authored
      Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock*
      instead of tcp_sock* to clean up type casts. This is a pure refactor
      patch.
      Signed-off-by: default avatarYousuk Seung <ysseung@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4c9f85f
    • Daniel Borkmann's avatar
      Merge branch 'bpf-af-xdp-zc-api' · 9fa06104
      Daniel Borkmann authored
      Björn Töpel says:
      
      ====================
      This patch serie introduces zerocopy (ZC) support for
      AF_XDP. Programs using AF_XDP sockets will now receive RX packets
      without any copies and can also transmit packets without incurring any
      copies. No modifications to the application are needed, but the NIC
      driver needs to be modified to support ZC. If ZC is not supported by
      the driver, the modes introduced in the AF_XDP patch will be
      used. Using ZC in our micro benchmarks results in significantly
      improved performance as can be seen in the performance section later
      in this cover letter.
      
      Note that for an untrusted application, HW packet steering to a
      specific queue pair (the one associated with the application) is a
      requirement when using ZC, as the application would otherwise be able
      to see other user space processes' packets. If the HW cannot support
      the required packet steering you need to use the XDP_SKB mode or the
      XDP_DRV mode without ZC turned on. The XSKMAP introduced in the AF_XDP
      patch set can be used to do load balancing in that case.
      
      For benchmarking, you can use the xdpsock application from the AF_XDP
      patch set without any modifications. Say that you would like your UDP
      traffic from port 4242 to end up in queue 16, that we will enable
      AF_XDP on. Here, we use ethtool for this:
      
            ethtool -N p3p2 rx-flow-hash udp4 fn
            ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
                action 16
      
      Running the rxdrop benchmark in XDP_DRV mode with zerocopy can then be
      done using:
      
            samples/bpf/xdpsock -i p3p2 -q 16 -r -N
      
      We have run some benchmarks on a dual socket system with two Broadwell
      E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
      cores which gives a total of 28, but only two cores are used in these
      experiments. One for TR/RX and one for the user space application. The
      memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
      8192MB and with 8 of those DIMMs in the system we have 64 GB of total
      memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
      NIC is Intel I40E 40Gbit/s using the i40e driver.
      
      Below are the results in Mpps of the I40E NIC benchmark runs for 64
      and 1500 byte packets, generated by a commercial packet generator HW
      outputing packets at full 40 Gbit/s line rate. The results are without
      retpoline so that we can compare against previous numbers.
      
      AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch
      set are also reported for ease of reference. The numbers within
      parantheses are from the RFC V1 ZC patch set.
      Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
      rxdrop       2.9*       9.6*       21.1(21.5)
      txpush       2.6*       -          22.0(21.6)
      l2fwd        1.9*       2.5*       15.3(15.0)
      
      AF_XDP performance 1500 byte packets:
      Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
      rxdrop       2.1*       3.3*       3.3(3.3)
      l2fwd        1.4*       1.8*       3.1(3.1)
      
      * From AF_XDP V3 patch set and cover letter.
      
      So why do we not get higher values for RX similar to the 34 Mpps we
      had in AF_PACKET V4? We made an experiment running the rxdrop
      benchmark without using the xdp_do_redirect/flush infrastructure nor
      using an XDP program (all traffic on a queue goes to one
      socket). Instead the driver acts directly on the AF_XDP socket. With
      this we got 36.9 Mpps, a significant improvement without any change to
      the uapi. So not forcing users to have an XDP program if they do not
      need it, might be a good idea. This measurement is actually higher
      than what we got with AF_PACKET V4.
      
      XDP performance on our system as a base line:
      
      64 byte packets:
      XDP stats       CPU     pps         issue-pps
      XDP-RX CPU      16      32.3M  0
      
      1500 byte packets:
      XDP stats       CPU     pps         issue-pps
      XDP-RX CPU      16      3.3M    0
      
      The structure of the patch set is as follows:
      
      Patches 1-3: Plumbing for AF_XDP ZC support
      Patches 4-5: AF_XDP ZC for RX
      Patches 6-7: AF_XDP ZC for TX
      Patch 8-10: ZC support for i40e.
      Patch 11: Use the bind flags in sample application to force TX skb
                path when -S is providedd on the command line.
      
      This patch set is based on the new uapi introduced in "AF_XDP: bug
      fixes and descriptor changes". You need to apply that patch set
      first, before applying this one.
      
      We based this patch set on bpf-next commit bd3a08aa ("bpf:
      flowlabel in bpf_fib_lookup should be flowinfo")
      
      Comments:
      
      * Implementing dynamic creation and deletion of queues in the i40e
        driver would facilitate the coexistence of xdp_redirect and af_xdp.
      
      Thanks: Björn and Magnus
      ====================
      
      Note: as agreed upon, i40e/zc bits will be routed via Jeff's tree.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9fa06104
    • David Ahern's avatar
      net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172
      David Ahern authored
      syzbot reported a use-after-free:
      
      BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
      Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
      
      CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
       ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Allocated by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:104
       __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
       ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
       ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
       ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Freed by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:140
       dst_release_immediate+0x71/0x9e net/core/dst.c:205
       fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      The problem is that rt_last can point to a deleted route if the insert
      fails.
      
      One reproducer is to insert a route and then add a multipath route that
      has a duplicate nexthop.e.g,:
          $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
          $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
      
      Fix by not setting rt_last until the it is verified the insert succeeded.
      
      Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7225172
    • Björn Töpel's avatar
      samples/bpf: xdpsock: use skb Tx path for XDP_SKB · 9f5232cc
      Björn Töpel authored
      Make sure that XDP_SKB also uses the skb Tx path.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9f5232cc
    • Magnus Karlsson's avatar
      xsk: wire upp Tx zero-copy functions · ac98d8aa
      Magnus Karlsson authored
      Here we add the functionality required to support zero-copy Tx, and
      also exposes various zero-copy related functions for the netdevs.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ac98d8aa
    • Magnus Karlsson's avatar
      net: added netdevice operation for Tx · e3760c7e
      Magnus Karlsson authored
      Added ndo_xsk_async_xmit. This ndo "kicks" the netdev to start to pull
      userland AF_XDP Tx frames from a NAPI context.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e3760c7e
    • Björn Töpel's avatar
      xsk: add zero-copy support for Rx · 173d3adb
      Björn Töpel authored
      Extend the xsk_rcv to support the new MEM_TYPE_ZERO_COPY memory, and
      wireup ndo_bpf call in bind.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      173d3adb
    • Björn Töpel's avatar
      xdp: add MEM_TYPE_ZERO_COPY · 02b55e56
      Björn Töpel authored
      Here, a new type of allocator support is added to the XDP return
      API. A zero-copy allocated xdp_buff cannot be converted to an
      xdp_frame. Instead is the buff has to be copied. This is not supported
      at all in this commit.
      
      Also, an opaque "handle" is added to xdp_buff. This can be used as a
      context for the zero-copy allocator implementation.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      02b55e56
    • Björn Töpel's avatar
      net: xdp: added bpf_netdev_command XDP_{QUERY, SETUP}_XSK_UMEM · 74515c57
      Björn Töpel authored
      Extend ndo_bpf with two new commands used for query zero-copy support
      and register an UMEM to a queue_id of a netdev.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      74515c57
    • Björn Töpel's avatar
      xsk: introduce xdp_umem_page · 8aef7340
      Björn Töpel authored
      The xdp_umem_page holds the address for a page. Trade memory for
      faster lookup. Later, we'll add DMA address here as well.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8aef7340
    • Björn Töpel's avatar
      xsk: moved struct xdp_umem definition · e61e62b9
      Björn Töpel authored
      Moved struct xdp_umem to xdp_sock.h, in order to prepare for zero-copy
      support.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e61e62b9
    • Kun Yi's avatar
      net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default. · 69e2eccc
      Kun Yi authored
      BCM54612E have 4 multi-functional LED pins that can be configured
      through register setting; the LED4 pin can be configured to a 125MHz
      reference clock output by setting the spare register. Since the dedicated
      CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
      pin is the only pin to provide such function in this package, and therefore
      it is beneficial to just enable the reference clock by default.
      Signed-off-by: default avatarKun Yi <kunyi@google.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69e2eccc
    • Guillaume Nault's avatar
      l2tp: fix refcount leakage on PPPoL2TP sockets · 3d609342
      Guillaume Nault authored
      Commit d02ba2a6 ("l2tp: fix race in pppol2tp_release with session
      object destroy") tried to fix a race condition where a PPPoL2TP socket
      would disappear while the L2TP session was still using it. However, it
      missed the root issue which is that an L2TP session may accept to be
      reconnected if its associated socket has entered the release process.
      
      The tentative fix makes the session hold the socket it is connected to.
      That saves the kernel from crashing, but introduces refcount leakage,
      preventing the socket from completing the release process. Once stalled,
      everything the socket depends on can't be released anymore, including
      the L2TP session and the l2tp_ppp module.
      
      The root issue is that, when releasing a connected PPPoL2TP socket, the
      session's ->sk pointer (RCU-protected) is reset to NULL and we have to
      wait for a grace period before destroying the socket. The socket drops
      the session in its ->sk_destruct callback function, so the session
      will exist until the last reference on the socket is dropped.
      Therefore, there is a time frame where pppol2tp_connect() may accept
      reconnecting a session, as it only checks ->sk to figure out if the
      session is connected. This time frame is shortened by the fact that
      pppol2tp_release() calls l2tp_session_delete(), making the session
      unreachable before resetting ->sk. However, pppol2tp_connect() may
      grab the session before it gets unhashed by l2tp_session_delete(), but
      it may test ->sk after the later got reset. The race is not so hard to
      trigger and syzbot found a pretty reliable reproducer:
      https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf
      
      Before d02ba2a6, another race could let pppol2tp_release()
      overwrite the ->__sk pointer of an L2TP session, thus tricking
      pppol2tp_put_sk() into calling sock_put() on a socket that is different
      than the one for which pppol2tp_release() was originally called. To get
      there, we had to trigger the race described above, therefore having one
      PPPoL2TP socket being released, while the session it is connected to is
      reconnecting to a different PPPoL2TP socket. When releasing this new
      socket fast enough, pppol2tp_release() overwrites the session's
      ->__sk pointer with the address of the new socket, before the first
      pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call
      invoked by the original socket will sock_put() the new socket,
      potentially dropping its last reference. When the second
      pppol2tp_put_sk() finally runs, its socket has already been freed.
      
      With d02ba2a6, the session takes a reference on both sockets.
      Furthermore, the session's ->sk pointer is reset in the
      pppol2tp_session_close() callback function rather than in
      pppol2tp_release(). Therefore, ->__sk can't be overwritten and
      pppol2tp_put_sk() is called only once (l2tp_session_delete() will only
      run pppol2tp_session_close() once, to protect the session against
      concurrent deletion requests). Now pppol2tp_put_sk() will properly
      sock_put() the original socket, but the new socket will remain, as
      l2tp_session_delete() prevented the release process from completing.
      Here, we don't depend on the ->__sk race to trigger the bug. Getting
      into the pppol2tp_connect() race is enough to leak the reference, no
      matter when new socket is released.
      
      So it all boils down to pppol2tp_connect() failing to realise that the
      session has already been connected. This patch drops the unneeded extra
      reference counting (mostly reverting d02ba2a6) and checks that
      neither ->sk nor ->__sk is set before allowing a session to be
      connected.
      
      Fixes: d02ba2a6 ("l2tp: fix race in pppol2tp_release with session object destroy")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d609342
    • David S. Miller's avatar
      Merge branch 'net-phy-improve-PM-handling-of-PHY-MDIO' · 7a723099
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: improve PM handling of PHY/MDIO
      
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      The situation can be improved by modeling PHY's as device type of
      a MDIO device. If for some other type of MDIO device PM ops are
      needed, it could be modeled as struct device_type as well.
      ====================
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a723099
    • Heiner Kallweit's avatar
      net: phy: remove PM ops from MDIO bus · 9107c05e
      Heiner Kallweit authored
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      
      Now that a device type representation of PHY's as special type of MDIO
      devices was added (only user of MDIO bus PM ops), the MDIO bus
      PM ops can be removed including member pm of struct mdio_device.
      
      If for some other type of MDIO device PM ops are needed, it should be
      modeled as struct device_type as well.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9107c05e
    • Heiner Kallweit's avatar
      net: phy: add struct device_type representation of a PHY · 7f4828ff
      Heiner Kallweit authored
      A PHY is a type of MDIO device, so let's model it as struct device_type
      and place PM ops, attribute groups and release callback on device type
      level. For this the attribute definitions have to be moved.
      This change allows us to get rid of the PM ops on a bus level in a second
      step.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f4828ff