1. 22 Feb, 2019 11 commits
    • Vadim Lomovtsev's avatar
      net: thunderx: correct typo in macro name · f6d25aca
      Vadim Lomovtsev authored
      Correct STREERING to STEERING at macro name for BGX steering register.
      Signed-off-by: default avatarVadim Lomovtsev <vlomovtsev@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6d25aca
    • Lorenzo Bianconi's avatar
      net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version · efcc9bca
      Lorenzo Bianconi authored
      Fix a possible NULL pointer dereference in ip6erspan_set_version checking
      nlattr data pointer
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
      #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
        __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
        rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
        rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
        netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg+0xdd/0x130 net/socket.c:631
        ___sys_sendmsg+0x806/0x930 net/socket.c:2136
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
        __do_sys_sendmsg net/socket.c:2183 [inline]
        __se_sys_sendmsg net/socket.c:2181 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440159
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
      RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
      R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace 09f8a7d13b4faaa1 ]---
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 4974d5f6 ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
      Reported-and-tested-by: syzbot+30191cf1057abd3064af@syzkaller.appspotmail.com
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efcc9bca
    • George Wilkie's avatar
      team: use operstate consistently for linkup · 8c7a7726
      George Wilkie authored
      When a port is added to a team, its initial state is derived
      from netif_carrier_ok rather than netif_oper_up.
      If it is carrier up but operationally down at the time of being
      added, the port state.linkup will be set prematurely.
      port state.linkup should be set consistently using
      netif_oper_up rather than netif_carrier_ok.
      
      Fixes: f1d22a1e ("team: account for oper state")
      Signed-off-by: default avatarGeorge Wilkie <gwilkie@vyatta.att-mail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c7a7726
    • David Chen's avatar
      r8152: Fix an error on RTL8153-BD MAC Address Passthrough support · c286909f
      David Chen authored
      RTL8153-BD is used in Dell DA300 type-C dongle.
      Added RTL8153-BD support to activate MAC address pass through on DA300.
      Apply correction on previously submitted patch in net.git tree.
      Signed-off-by: default avatarDavid Chen <david.chen7@dell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c286909f
    • Daniel Borkmann's avatar
      ipvlan: disallow userns cap_net_admin to change global mode/flags · 7cc9f700
      Daniel Borkmann authored
      When running Docker with userns isolation e.g. --userns-remap="default"
      and spawning up some containers with CAP_NET_ADMIN under this realm, I
      noticed that link changes on ipvlan slave device inside that container
      can affect all devices from this ipvlan group which are in other net
      namespaces where the container should have no permission to make changes
      to, such as the init netns, for example.
      
      This effectively allows to undo ipvlan private mode and switch globally to
      bridge mode where slaves can communicate directly without going through
      hostns, or it allows to switch between global operation mode (l2/l3/l3s)
      for everyone bound to the given ipvlan master device. libnetwork plugin
      here is creating an ipvlan master and ipvlan slave in hostns and a slave
      each that is moved into the container's netns upon creation event.
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
           link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
           ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
           inet 10.41.0.1/32 scope link cilium_host
             valid_lft forever preferred_lft forever
        [...]
      
      * Spawn container & change ipvlan mode setting inside of it:
      
        # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
        9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
        # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns (mode switched to l2):
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      Same l3 -> l2 switch would also happen by creating another slave inside
      the container's network namespace when specifying the existing cilium0
      link to derive the actual (bond0) master:
      
        # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      One way to mitigate it is to check CAP_NET_ADMIN permissions of
      the ipvlan master device's ns, and only then allow to change
      mode or flags for all devices bound to it. Above two cases are
      then disallowed after the patch.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cc9f700
    • Maciej Kwiecien's avatar
      sctp: don't compare hb_timer expire date before starting it · d1f20c03
      Maciej Kwiecien authored
      hb_timer might not start at all for a particular transport because its
      start is conditional. In a result a node is not sending heartbeats.
      
      Function sctp_transport_reset_hb_timer has two roles:
          - initial start of hb_timer for a given transport,
          - update expire date of hb_timer for a given transport.
      The function is optimized to update timer's expire only if it is before
      a new calculated one but this comparison is invalid for a timer which
      has not yet started. Such a timer has expire == 0 and if a new expire
      value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
      fail and timer will not start resulting in no heartbeat packets send by
      the node.
      
      This was found when association was initialized within first 5 mins
      after system boot due to jiffies init value which is near to MAX_JIFFIES.
      
      Test kernel version: 4.9.154 (ARCH=arm)
      hb_timer.expire = 0;                //initialized, not started timer
      new_expire = MAX_JIFFIES / 2 + 2;   //or more
      time_before(hb_timer.expire, new_expire) == false
      
      Fixes: ba6f5e33 ("sctp: avoid refreshing heartbeat timer too often")
      Reported-by: default avatarMarcin Stojek <marcin.stojek@nokia.com>
      Tested-by: default avatarMarcin Stojek <marcin.stojek@nokia.com>
      Signed-off-by: default avatarMaciej Kwiecien <maciej.kwiecien@nokia.com>
      Reviewed-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1f20c03
    • Arnd Bergmann's avatar
      phonet: fix building with clang · 6321aa19
      Arnd Bergmann authored
      clang warns about overflowing the data[] member in the struct pnpipehdr:
      
      net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
                              if (hdr->data[4] == PEP_IND_READY)
                                  ^         ~
      include/net/phonet/pep.h:66:3: note: array 'data' declared here
                      u8              data[1];
      
      Using a flexible array member at the end of the struct avoids the
      warning, but since we cannot have a flexible array member inside
      of the union, each index now has to be moved back by one, which
      makes it a little uglier.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarRémi Denis-Courmont <remi@remlab.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6321aa19
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · b35560e4
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2019-02-21
      
      1) Don't do TX bytes accounting for the esp trailer when sending
         from a request socket as this will result in an out of bounds
         memory write. From Martin Willi.
      
      2) Destroy xfrm_state synchronously on net exit path to
         avoid nested gc flush callbacks that may trigger a
         warning in xfrm6_tunnel_net_exit(). From Cong Wang.
      
      3) Do an unconditionally clone in pfkey_broadcast_one()
         to avoid a race when freeing the skb.
         From Sean Tranchetti.
      
      4) Fix inbound traffic via XFRM interfaces across network
         namespaces. We did the lookup for interfaces and policies
         in the wrong namespace. From Tobias Brunner.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b35560e4
    • David S. Miller's avatar
      Merge branch 'report-erspan-version-field-just-for-erspan-tunnels' · 31088cb5
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      report erspan version field just for erspan tunnels
      
      Do not report erspan_version to userpsace for non erspan tunnels.
      Report IFLA_GRE_ERSPAN_INDEX only for erspan version 1 in
      ip6gre_fill_info
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31088cb5
    • Lorenzo Bianconi's avatar
      net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap · 103d0244
      Lorenzo Bianconi authored
      Report erspan version field to userspace in ip6gre_fill_info just for
      erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for
      erspan version 1.
      The issue can be triggered with the following reproducer:
      
      $ip link add name gre6 type ip6gre local 2001::1 remote 2002::2
      $ip link set gre6 up
      $ip -d link sh gre6
      14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0
          ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64
      
      Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      103d0244
    • Lorenzo Bianconi's avatar
      net: ip_gre: do not report erspan_ver for gre or gretap · 2bdf700e
      Lorenzo Bianconi authored
      Report erspan version field to userspace in ipgre_fill_info just for
      erspan tunnels. The issue can be triggered with the following reproducer:
      
      $ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1
      $ip link set dev gre1 up
      $ip -d link sh gre1
      13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0
          gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1
      
      Fixes: f551c91d ("net: erspan: introduce erspan v2 for ip_gre")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bdf700e
  2. 21 Feb, 2019 19 commits
    • Willem de Bruijn's avatar
      net: avoid false positives in untrusted gso validation · 9e8db591
      Willem de Bruijn authored
      GSO packets with vnet_hdr must conform to a small set of gso_types.
      The below commit uses flow dissection to drop packets that do not.
      
      But it has false positives when the skb is not fully initialized.
      Dissection needs skb->protocol and skb->network_header.
      
      Infer skb->protocol from gso_type as the two must agree.
      SKB_GSO_UDP can use both ipv4 and ipv6, so try both.
      
      Exclude callers for which network header offset is not known.
      
      Fixes: d5be7f63 ("net: validate untrusted gso packets without csum offload")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e8db591
    • David S. Miller's avatar
      Merge branch 'tipc-improvement-for-wait-and-wakeup' · 06cd1702
      David S. Miller authored
      Tung Nguyen says:
      
      ====================
      tipc: improvement for wait and wakeup
      
      Some improvements for tipc_wait_for_xzy().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06cd1702
    • Tung Nguyen's avatar
      tipc: improve function tipc_wait_for_rcvmsg() · 48766a58
      Tung Nguyen authored
      This commit replaces schedule_timeout() with wait_woken()
      in function tipc_wait_for_rcvmsg(). wait_woken() uses
      memory barriers in its implementation to avoid potential
      race condition when putting a process into sleeping state
      and then waking it up.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48766a58
    • Tung Nguyen's avatar
      tipc: improve function tipc_wait_for_cond() · 223b7329
      Tung Nguyen authored
      Commit 844cf763 ("tipc: make macro tipc_wait_for_cond() smp safe")
      replaced finish_wait() with remove_wait_queue() but still used
      prepare_to_wait(). This causes unnecessary conditional
      checking  before adding to wait queue in prepare_to_wait().
      
      This commit replaces prepare_to_wait() with add_wait_queue()
      as the pair function with remove_wait_queue().
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      223b7329
    • Michal Soltys's avatar
      bonding: fix PACKET_ORIGDEV regression · 3c963a33
      Michal Soltys authored
      This patch fixes a subtle PACKET_ORIGDEV regression which was a side
      effect of fixes introduced by:
      
      6a9e461f bonding: pass link-local packets to bonding master also.
      
      ... to:
      
      b89f04c6 bonding: deliver link-local packets with skb->dev set to link that packets arrived on
      
      While 6a9e461f restored pre-b89f04c6 presence of link-local
      packets on bonding masters (which is required e.g. by linux bridges
      participating in spanning tree or needed for lab-like setups created
      with group_fwd_mask) it also caused the originating device
      information to be lost due to cloning.
      
      Maciej Żenczykowski proposed another solution that doesn't require
      packet cloning and retains original device information - instead of
      returning RX_HANDLER_PASS for all link-local packets it's now limited
      only to packets from inactive slaves.
      
      At the same time, packets passed to bonding masters retain correct
      information about the originating device and PACKET_ORIGDEV can be used
      to determine it.
      
      This elegantly solves all issues so far:
      
      - link-local packets that were removed from bonding masters
      - LLDP daemons being forced to explicitly bind to slave interfaces
      - PACKET_ORIGDEV having no effect on bond interfaces
      
      Fixes: 6a9e461f (bonding: pass link-local packets to bonding master also.)
      Reported-by: default avatarVincent Bernat <vincent@bernat.ch>
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c963a33
    • Hangbin Liu's avatar
      net: vrf: remove MTU limits for vrf device · ad49bc63
      Hangbin Liu authored
      Similiar to commit e94cd811 ("net: remove MTU limits for dummy and
      ifb device"), MTU is irrelevant for VRF device. We init it as 64K while
      limit it to [68, 1500] may make users feel confused.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad49bc63
    • Jann Horn's avatar
      MAINTAINERS: mark CAIF as orphan · 18de100e
      Jann Horn authored
      The listed address for the CAIF maintainer bounces with
      "553 5.3.0 <dmitry.tarnyagin@lockless.no>... No such user here", and the
      only existing email address of the maintainer in git history hasn't
      responded in a week.
      Therefore, remove the listed maintainer and mark CAIF as orphan.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18de100e
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 033575ec
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Fixes 2019-02-21
      
      This series contains fixes to ixgbe and i40e.
      
      Majority of the fixes are to resolve XDP issues found in both drivers,
      there is only one fix which is not XDP related.  That one fix resolves
      an issue seen on older 10GbE devices, where UDP traffic was either being
      dropped or being transmitted out of order when the bit to enable L3/L4
      filtering for transmit switched packets is enabled on older devices that
      did not support this option.
      
      Magnus fixes an XDP issue for both ixgbe and i40e, where receive rings
      are created but no buffers are allocated for AF_XDP in zero-copy mode,
      so no packets can be received and no interrupts will be generated so
      that NAPI poll function that allocates buffers to the rings will never
      get executed.
      
      Björn fixes a race in XDP xmit ring cleanup for i40e, where
      ndo_xdp_xmit() must be taken into consideration.  Added a
      synchronize_rcu() to wait for napi(s) before clearing the queue.
      
      Jan fixes a ixgbe AF_XDP zero-copy transmit issue which can cause a
      reset to be triggered, so add a check to ensure that netif carrier is
      'ok' before trying to transmit packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      033575ec
    • Jan Sokolowski's avatar
      ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK · c685c69f
      Jan Sokolowski authored
      An issue has been found while testing zero-copy XDP that
      causes a reset to be triggered. As it takes some time to
      turn the carrier on after setting zc, and we already
      start trying to transmit some packets, watchdog considers
      this as an erroneous state and triggers a reset.
      
      Don't do any work if netif carrier is not OK.
      
      Fixes: 8221c5eb (ixgbe: add AF_XDP zero-copy Tx support)
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c685c69f
    • Björn Töpel's avatar
      i40e: fix XDP_REDIRECT/XDP xmit ring cleanup race · 59eb2a88
      Björn Töpel authored
      When the driver clears the XDP xmit ring due to re-configuration or
      teardown, in-progress ndo_xdp_xmit must be taken into consideration.
      
      The ndo_xdp_xmit function is typically called from a NAPI context that
      the driver does not control. Therefore, we must be careful not to
      clear the XDP ring, while the call is on-going. This patch adds a
      synchronize_rcu() to wait for napi(s) (preempt-disable regions and
      softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY
      flag is checked in the ndo_xdp_xmit implementation to avoid touching
      the XDP xmit queue during re-configuration.
      
      Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
      Fixes: 123cecd4 ("i40e: added queue pair disable/enable functions")
      Reported-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      59eb2a88
    • Magnus Karlsson's avatar
      ixgbe: fix potential RX buffer starvation for AF_XDP · 4a9b32f3
      Magnus Karlsson authored
      When the RX rings are created they are also populated with buffers so
      that packets can be received. Usually these are kernel buffers, but
      for AF_XDP in zero-copy mode, these are user-space buffers and in this
      case the application might not have sent down any buffers to the
      driver at this point. And if no buffers are allocated at ring creation
      time, no packets can be received and no interrupts will be generated so
      the NAPI poll function that allocates buffers to the rings will never
      get executed.
      
      To rectify this, we kick the NAPI context of any queue with an
      attached AF_XDP zero-copy socket in two places in the code. Once after
      an XDP program has loaded and once after the umem is registered.  This
      take care of both cases: XDP program gets loaded first then AF_XDP
      socket is created, and the reverse, AF_XDP socket is created first,
      then XDP program is loaded.
      
      Fixes: d0bcacd0 ("ixgbe: add AF_XDP zero-copy Rx support")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4a9b32f3
    • Magnus Karlsson's avatar
      i40e: fix potential RX buffer starvation for AF_XDP · 14ffeb52
      Magnus Karlsson authored
      When the RX rings are created they are also populated with buffers
      so that packets can be received. Usually these are kernel buffers,
      but for AF_XDP in zero-copy mode, these are user-space buffers and
      in this case the application might not have sent down any buffers
      to the driver at this point. And if no buffers are allocated at ring
      creation time, no packets can be received and no interrupts will be
      generated so the NAPI poll function that allocates buffers to the
      rings will never get executed.
      
      To rectify this, we kick the NAPI context of any queue with an
      attached AF_XDP zero-copy socket in two places in the code. Once
      after an XDP program has loaded and once after the umem is registered.
      This take care of both cases: XDP program gets loaded first then AF_XDP
      socket is created, and the reverse, AF_XDP socket is created first,
      then XDP program is loaded.
      
      Fixes: 0a714186 ("i40e: add AF_XDP zero-copy Rx support")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      14ffeb52
    • Jeff Kirsher's avatar
      ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN · 156a67a9
      Jeff Kirsher authored
      The enabling L3/L4 filtering for transmit switched packets for all
      devices caused unforeseen issue on older devices when trying to send UDP
      traffic in an ordered sequence.  This bit was originally intended for X550
      devices, which supported this feature, so limit the scope of this bit to
      only X550 devices.
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      156a67a9
    • Ursula Braun's avatar
      net/smc: fix smc_poll in SMC_INIT state · d7cf4a3b
      Ursula Braun authored
      smc_poll() returns with mask bit EPOLLPRI if the connection urg_state
      is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals
      EPOLLPRI errorneously if called in state SMC_INIT before the connection
      is created, for instance in a non-blocking connect scenario.
      
      This patch switches to non-zero values for the urg states.
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Fixes: de8474eb ("net/smc: urgent data support")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7cf4a3b
    • David S. Miller's avatar
      Merge branch 'ipv6-route-rcu' · 64cc41e6
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      ipv6: route: enforce RCU protection for fib6_info->from
      
      This series addresses a couple of RCU left-over dating back to rt6_info->from
      conversion to RCU
      
      v1 -> v2:
       - fix a possible race in patch 1
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64cc41e6
    • Paolo Abeni's avatar
      ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() · bf1dc8ba
      Paolo Abeni authored
      We need a RCU critical section around rt6_info->from deference, and
      proper annotation.
      
      Fixes: 4ed591c8 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf1dc8ba
    • Paolo Abeni's avatar
      ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() · 193f3685
      Paolo Abeni authored
      We must access rt6_info->from under RCU read lock: move the
      dereference under such lock, with proper annotation.
      
      v1 -> v2:
       - avoid using multiple, racy, fetch operations for rt->from
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      193f3685
    • Al Viro's avatar
      missing barriers in some of unix_sock ->addr and ->path accesses · ae3b5641
      Al Viro authored
      Several u->addr and u->path users are not holding any locks in
      common with unix_bind().  unix_state_lock() is useless for those
      purposes.
      
      u->addr is assign-once and *(u->addr) is fully set up by the time
      we set u->addr (all under unix_table_lock).  u->path is also
      set in the same critical area, also before setting u->addr, and
      any unix_sock with ->path filled will have non-NULL ->addr.
      
      So setting ->addr with smp_store_release() is all we need for those
      "lockless" users - just have them fetch ->addr with smp_load_acquire()
      and don't even bother looking at ->path if they see NULL ->addr.
      
      Users of ->addr and ->path fall into several classes now:
          1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
      and u->path only if smp_load_acquire() has returned non-NULL.
          2) places holding unix_table_lock.  These are guaranteed that
      *(u->addr) is seen fully initialized.  If unix_sock is in one of the
      "bound" chains, so's ->path.
          3) unix_sock_destructor() using ->addr is safe.  All places
      that set u->addr are guaranteed to have seen all stores *(u->addr)
      while holding a reference to u and unix_sock_destructor() is called
      when (atomic) refcount hits zero.
          4) unix_release_sock() using ->path is safe.  unix_bind()
      is serialized wrt unix_release() (normally - by struct file
      refcount), and for the instances that had ->path set by unix_bind()
      unix_release_sock() comes from unix_release(), so they are fine.
      Instances that had it set in unix_stream_connect() either end up
      attached to a socket (in unix_accept()), in which case the call
      chain to unix_release_sock() and serialization are the same as in
      the previous case, or they never get accept'ed and unix_release_sock()
      is called when the listener is shut down and its queue gets purged.
      In that case the listener's queue lock provides the barriers needed -
      unix_stream_connect() shoves our unix_sock into listener's queue
      under that lock right after having set ->path and eventual
      unix_release_sock() caller picks them from that queue under the
      same lock right before calling unix_release_sock().
          5) unix_find_other() use of ->path is pointless, but safe -
      it happens with successful lookup by (abstract) name, so ->path.dentry
      is guaranteed to be NULL there.
      earlier-variant-reviewed-by: default avatar"Paul E. McKenney" <paulmck@linux.ibm.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae3b5641
    • Russell King's avatar
      net: marvell: mvneta: fix DMA debug warning · a8fef9ba
      Russell King authored
      Booting 4.20 on SolidRun Clearfog issues this warning with DMA API
      debug enabled:
      
      WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
      mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
      Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
      CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
      Hardware name: Marvell Armada 380/385 (Device Tree)
      [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
      [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
      [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
      [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
      [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
      [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
      [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
      [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
      [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
      [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
      [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
      [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
      [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
      ...
      
      This appears to be caused by mvneta_rx_hwbm() calling
      dma_sync_single_range_for_cpu() with the wrong struct device pointer,
      as the buffer manager device pointer is used to map and unmap the
      buffer.  Fix this.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8fef9ba
  3. 20 Feb, 2019 2 commits
    • Russell King's avatar
      net: dsa: fix unintended change of bridge interface STP state · 9c2054a5
      Russell King authored
      When a DSA port is added to a bridge and brought up, the resulting STP
      state programmed into the hardware depends on the order that these
      operations are performed.  However, the Linux bridge code believes that
      the port is in disabled mode.
      
      If the DSA port is first added to a bridge and then brought up, it will
      be in blocking mode.  If it is brought up and then added to the bridge,
      it will be in disabled mode.
      
      This difference is caused by DSA always setting the STP mode in
      dsa_port_enable() whether or not this port is part of a bridge.  Since
      bridge always sets the STP state when the port is added, brought up or
      taken down, it is unnecessary for us to manipulate the STP state.
      
      Apparently, this code was copied from Rocker, and the very next day a
      similar fix for Rocker was merged but was not propagated to DSA.  See
      e47172ab ("rocker: put port in FORWADING state after leaving bridge")
      
      Fixes: b73adef6 ("net: dsa: integrate with SWITCHDEV for HW bridging")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c2054a5
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 40e196a9
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw
          Gruszka.
      
       2) Missing memory barriers in xsk, from Magnus Karlsson.
      
       3) rhashtable fixes in mac80211 from Herbert Xu.
      
       4) 32-bit MIPS eBPF JIT fixes from Paul Burton.
      
       5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens.
      
       6) GSO validation fixes from Willem de Bruijn.
      
       7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue.
      
       8) More strict checks in tcp_v4_err(), from Eric Dumazet.
      
       9) af_alg_release should NULL out the sk after the sock_put(), from Mao
          Wenan.
      
      10) Missing unlock in mac80211 mesh error path, from Wei Yongjun.
      
      11) Missing device put in hns driver, from Salil Mehta.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        sky2: Increase D3 delay again
        vhost: correctly check the return value of translate_desc() in log_used()
        net: netcp: Fix ethss driver probe issue
        net: hns: Fixes the missing put_device in positive leg for roce reset
        net: stmmac: Fix a race in EEE enable callback
        qed: Fix iWARP syn packet mac address validation.
        qed: Fix iWARP buffer size provided for syn packet processing.
        r8152: Add support for MAC address pass through on RTL8153-BD
        mac80211: mesh: fix missing unlock on error in table_path_del()
        net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"
        net: crypto set sk to NULL when af_alg_release.
        net: Do not allocate page fragments that are not skb aligned
        mm: Use fixed constant in page_frag_alloc instead of size + 1
        tcp: tcp_v4_err() should be more careful
        tcp: clear icsk_backoff in tcp_write_queue_purge()
        net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
        qmi_wwan: apply SET_DTR quirk to Sierra WP7607
        net: stmmac: handle endianness in dwmac4_get_timestamp
        doc: Mention MSG_ZEROCOPY implementation for UDP
        mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable
        ...
      40e196a9
  4. 19 Feb, 2019 8 commits