1. 05 Feb, 2019 7 commits
    • Or Gerlitz's avatar
      net/mlx5e: Properly set steering match levels for offloaded TC decap rules · 6363651d
      Or Gerlitz authored
      The match level computed by the driver gets to be wrong for decap
      rules with wildcarded inner packet match such as:
      
      tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower
             enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789
             action tunnel_key unset
             action mirred egress redirect dev eth1
      
      The FW errs for a missing matching meta-data indicator for the outer
      headers (where we do have a match), and a wrong matching meta-data
      indicator for the inner headers (where we don't have a match).
      
      Fix that by taking into account the matching on the tunnel info and
      relating the match level of the encapsulated packet to the firmware
      inner headers indicator in case of decap.
      
      As for vxlan we mandate a match on the tunnel udp dst port, and in general
      we practically madndate a match on the source or dest ip for any IP tunnel,
      the fix was done in a minimal manner around the tunnel match parsing code.
      
      Fixes: d708f902 ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarSlava Ovsiienko <viacheslavo@mellanox.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6363651d
    • Raed Salem's avatar
      net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance · 82eaa1fa
      Raed Salem authored
      At Innova IPsec TX offload data path a special software parser metadata
      is used to pass some packet attributes to the hardware, this metadata
      is passed using the Ethernet control segment of a WQE (a HW descriptor)
      header.
      
      The cited commit might nullify this header, hence the metadata is lost,
      this caused a significant performance drop during hw offloading
      operation.
      
      Fix by restoring the metadata at the Ethernet control segment in case
      it was nullified.
      
      Fixes: 37fdffb2 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      82eaa1fa
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f09bef61
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Use CONFIG_NF_TABLES_INET from seltests, not NF_TABLES_INET.
         From Naresh Kamboju.
      
      2) Add a test to cover masquerading and redirect case, from Florian
         Westphal.
      
      3) Two packets coming from the same socket may race to set up NAT,
         ending up with different tuples and the packet losing race being
         dropped. Update nf_conntrack_tuple_taken() to exercise clash
         resolution for this case. From Martynas Pumputis and Florian
         Westphal.
      
      4) Unbind anonymous sets from the commit and abort path, this fixes
         a splat due to double set list removal/release in case that the
         transaction needs to be aborted.
      
      5) Do not preserve original output interface for packets that are
         redirected in the output chain when ip6_route_me_harder() is
         called. Otherwise packets end up going not going to the loopback
         device. From Eli Cooper.
      
      6) Fix bogus splat in nft_compat with CONFIG_REFCOUNT_FULL=y, this
         also simplifies the existing logic to deal with the list insertions
         of the xtables extensions. From Florian Westphal.
      
      Diffstat look rather larger than usual because of the new selftest, but
      Florian and I consider that having tests soon into the tree is good to
      improve coverage. If there's a different policy in this regard, please,
      let me know.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f09bef61
    • Florian Westphal's avatar
      netfilter: nft_compat: don't use refcount_inc on newly allocated entry · 947e492c
      Florian Westphal authored
      When I moved the refcount to refcount_t type I missed the fact that
      refcount_inc() will result in use-after-free warning with
      CONFIG_REFCOUNT_FULL=y builds.
      
      The correct fix would be to init the reference count to 1 at allocation
      time, but, unfortunately we cannot do this, as we can't undo that
      in case something else fails later in the batch.
      
      So only solution I see is to special-case the 'new entry' condition
      and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
      as done here.
      
      The .activate callback can be removed to simplify things, we only
      need to make sure that deactivate() decrements/unlinks the entry
      from the list at end of transaction phase (commit or abort).
      
      Fixes: 12c44aba ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count")
      Reported-by: default avatarJordan Glover <Golden_Miller83@protonmail.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      947e492c
    • Eli Cooper's avatar
      netfilter: ipv6: Don't preserve original oif for loopback address · 15df03c6
      Eli Cooper authored
      Commit 508b0904 ("netfilter: ipv6: Preserve link scope traffic
      original oif") made ip6_route_me_harder() keep the original oif for
      link-local and multicast packets. However, it also affected packets
      for the loopback address because it used rt6_need_strict().
      
      REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
      address; thus its oif should not be preserved. This commit fixes the bug
      that redirected local packets are being dropped. Actually the packet was
      not exactly dropped; Instead it was sent out to the original oif rather
      than lo. When a packet with daddr ::1 is sent to the router, it is
      effectively dropped.
      
      Fixes: 508b0904 ("netfilter: ipv6: Preserve link scope traffic original oif")
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      15df03c6
    • Marc Zyngier's avatar
      net: dsa: Fix lockdep false positive splat · c8101f77
      Marc Zyngier authored
      Creating a macvtap on a DSA-backed interface results in the following
      splat when lockdep is enabled:
      
      [   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
      [   23.041198] device lan0 entered promiscuous mode
      [   23.043445] device eth0 entered promiscuous mode
      [   23.049255]
      [   23.049557] ============================================
      [   23.055021] WARNING: possible recursive locking detected
      [   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted
      [   23.066132] --------------------------------------------
      [   23.071598] ip/2861 is trying to acquire lock:
      [   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
      [   23.083693]
      [   23.083693] but task is already holding lock:
      [   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.096774]
      [   23.096774] other info that might help us debug this:
      [   23.103494]  Possible unsafe locking scenario:
      [   23.103494]
      [   23.109584]        CPU0
      [   23.112093]        ----
      [   23.114601]   lock(_xmit_ETHER);
      [   23.117917]   lock(_xmit_ETHER);
      [   23.121233]
      [   23.121233]  *** DEADLOCK ***
      [   23.121233]
      [   23.127325]  May be due to missing lock nesting notation
      [   23.127325]
      [   23.134315] 2 locks held by ip/2861:
      [   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
      [   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.153757]
      [   23.153757] stack backtrace:
      [   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118
      [   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
      [   23.172843] Call trace:
      [   23.175358]  dump_backtrace+0x0/0x188
      [   23.179116]  show_stack+0x14/0x20
      [   23.182524]  dump_stack+0xb4/0xec
      [   23.185928]  __lock_acquire+0x123c/0x1860
      [   23.190048]  lock_acquire+0xc8/0x248
      [   23.193724]  _raw_spin_lock_bh+0x40/0x58
      [   23.197755]  dev_set_rx_mode+0x1c/0x38
      [   23.201607]  dev_set_promiscuity+0x3c/0x50
      [   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
      [   23.210567]  __dev_set_promiscuity+0x148/0x1e0
      [   23.215136]  __dev_set_rx_mode+0x74/0x98
      [   23.219167]  dev_uc_add+0x54/0x70
      [   23.222575]  macvlan_open+0x170/0x1d0
      [   23.226336]  __dev_open+0xe0/0x160
      [   23.229830]  __dev_change_flags+0x16c/0x1b8
      [   23.234132]  dev_change_flags+0x20/0x60
      [   23.238074]  do_setlink+0x2d0/0xc50
      [   23.241658]  __rtnl_newlink+0x5f8/0x6e8
      [   23.245601]  rtnl_newlink+0x50/0x78
      [   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
      [   23.253397]  netlink_rcv_skb+0xe8/0x130
      [   23.257338]  rtnetlink_rcv+0x14/0x20
      [   23.261012]  netlink_unicast+0x190/0x210
      [   23.265043]  netlink_sendmsg+0x288/0x350
      [   23.269075]  sock_sendmsg+0x18/0x30
      [   23.272659]  ___sys_sendmsg+0x29c/0x2c8
      [   23.276602]  __sys_sendmsg+0x60/0xb8
      [   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
      [   23.284488]  el0_svc_common+0xd8/0x138
      [   23.288340]  el0_svc_handler+0x24/0x80
      [   23.292192]  el0_svc+0x8/0xc
      
      This looks fairly harmless (no actual deadlock occurs), and is
      fixed in a similar way to c6894dec ("bridge: fix lockdep
      addr_list_lock false positive splat") by putting the addr_list_lock
      in its own lockdep class.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8101f77
    • Rundong Ge's avatar
      net: dsa: slave: Don't propagate flag changes on down slave interfaces · 17ab4f61
      Rundong Ge authored
      The unbalance of master's promiscuity or allmulti will happen after ifdown
      and ifup a slave interface which is in a bridge.
      
      When we ifdown a slave interface , both the 'dsa_slave_close' and
      'dsa_slave_change_rx_flags' will clear the master's flags. The flags
      of master will be decrease twice.
      In the other hand, if we ifup the slave interface again, since the
      slave's flags were cleared the 'dsa_slave_open' won't set the master's
      flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
      will set the master's flags. The flags of master is increase once.
      
      Only propagating flag changes when a slave interface is up makes
      sure this does not happen. The 'vlan_dev_change_rx_flags' had the
      same problem and was fixed, and changes here follows that fix.
      
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: default avatarRundong Ge <rdong.ge@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17ab4f61
  2. 04 Feb, 2019 18 commits
  3. 03 Feb, 2019 7 commits
  4. 01 Feb, 2019 8 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e7b81641
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-01-31
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) disable preemption in sender side of socket filters, from Alexei.
      
      2) fix two potential deadlocks in syscall bpf lookup and prog_register,
         from Martin and Alexei.
      
      3) fix BTF to allow typedef on func_proto, from Yonghong.
      
      4) two bpftool fixes, from Jiri and Paolo.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b81641
    • Eric Dumazet's avatar
      dccp: fool proof ccid_hc_[rt]x_parse_options() · 9b1f19d8
      Eric Dumazet authored
      Similarly to commit 276bdb82 ("dccp: check ccid before dereferencing")
      it is wise to test for a NULL ccid.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
      RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
      Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
      kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
      RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
      RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
      RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
      R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
      kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
       dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
       sk_backlog_rcv include/net/sock.h:936 [inline]
       __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
       dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
       ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
       ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
       __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
       __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
       process_backlog+0x206/0x750 net/core/dev.c:5923
       napi_poll net/core/dev.c:6346 [inline]
       net_rx_action+0x76d/0x1930 net/core/dev.c:6412
       __do_softirq+0x30b/0xb11 kernel/softirq.c:292
       run_ksoftirqd kernel/softirq.c:654 [inline]
       run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
       smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
       kthread+0x357/0x430 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace 58a0ba03bea2c376 ]---
      RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
      RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
      Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
      RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
      RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
      RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
      R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b1f19d8
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · ec34f792
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2019-01-30
      
      here are some fixes in different areas of the smc code for the net
      tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec34f792
    • Karsten Graul's avatar
      net/smc: fix use of variable in cleared area · 46ad0222
      Karsten Graul authored
      Do not use pend->idx as index for the arrays because its value is
      located in the cleared area. Use the existing local variable instead.
      Without this fix the wrong area might be cleared.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46ad0222
    • Karsten Graul's avatar
      net/smc: use device link provided in qp_context · e5f3aa04
      Karsten Graul authored
      The device field of the IB event structure does not always point to the
      SMC IB device. Load the pointer from the qp_context which is always
      provided to smc_ib_qp_event_handler() in the priv field. And for qp
      events the affected port is given in the qp structure of the ibevent,
      derive it from there.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5f3aa04
    • Karsten Graul's avatar
      net/smc: call smc_cdc_msg_send() under send_lock · 2dee25af
      Karsten Graul authored
      Call smc_cdc_msg_send() under the connection send_lock to make sure all
      send operations for one connection are serialized.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dee25af
    • Karsten Graul's avatar
      net/smc: do not wait under send_lock · 33f3fcc2
      Karsten Graul authored
      smc_cdc_get_free_slot() might wait for free transfer buffers when using
      SMC-R. This wait should not be done under the send_lock, which is a
      spin_lock. This fixes a cpu loop in parallel threads waiting for the
      send_lock.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33f3fcc2
    • Karsten Graul's avatar
      net/smc: recvmsg and splice_read should return 0 after shutdown · 51c5aba3
      Karsten Graul authored
      When a socket was connected and is now shut down for read, return 0 to
      indicate end of data in recvmsg and splice_read (like TCP) and do not
      return ENOTCONN. This behavior is required by the socket api.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51c5aba3