1. 01 May, 2017 40 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · a01aa920
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. A large bunch of code cleanups, simplify the conntrack extension
      codebase, get rid of the fake conntrack object, speed up netns by
      selective synchronize_net() calls. More specifically, they are:
      
      1) Check for ct->status bit instead of using nfct_nat() from IPVS and
         Netfilter codebase, patch from Florian Westphal.
      
      2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
      
      3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
      
      4) Introduce nft_is_base_chain() helper function.
      
      5) Enforce expectation limit from userspace conntrack helper,
         from Gao Feng.
      
      6) Add nf_ct_remove_expect() helper function, from Gao Feng.
      
      7) NAT mangle helper function return boolean, from Gao Feng.
      
      8) ctnetlink_alloc_expect() should only work for conntrack with
         helpers, from Gao Feng.
      
      9) Add nfnl_msg_type() helper function to nfnetlink to build the
         netlink message type.
      
      10) Get rid of unnecessary cast on void, from simran singhal.
      
      11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
          also from simran singhal.
      
      12) Use list_prev_entry() from nf_tables, from simran signhal.
      
      13) Remove unnecessary & on pointer function in the Netfilter and IPVS
          code.
      
      14) Remove obsolete comment on set of rules per CPU in ip6_tables,
          no longer true. From Arushi Singhal.
      
      15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
      
      16) Remove unnecessary nested rcu_read_lock() in
          __nf_nat_decode_session(). Code running from hooks are already
          guaranteed to run under RCU read side.
      
      17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
      
      18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
          also from Aaron.
      
      19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
      
      20) Don't propagate NF_DROP error to userspace via ctnetlink in
          __nf_nat_alloc_null_binding() function, from Gao Feng.
      
      21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
          from Gao Feng.
      
      22) Kill the fake untracked conntrack objects, use ctinfo instead to
          annotate a conntrack object is untracked, from Florian Westphal.
      
      23) Remove nf_ct_is_untracked(), now obsolete since we have no
          conntrack template anymore, from Florian.
      
      24) Add event mask support to nft_ct, also from Florian.
      
      25) Move nf_conn_help structure to
          include/net/netfilter/nf_conntrack_helper.h.
      
      26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
          Thus, we don't deal with variable conntrack extensions anymore.
          Make sure userspace conntrack helper doesn't go over that size.
          Remove variable size ct extension infrastructure now this code
          got no more clients. From Florian Westphal.
      
      27) Restore offset and length of nf_ct_ext structure to 8 bytes now
          that wraparound is not possible any longer, also from Florian.
      
      28) Allow to get rid of unassured flows under stress in conntrack,
          this applies to DCCP, SCTP and TCP protocols, from Florian.
      
      29) Shrink size of nf_conntrack_ecache structure, from Florian.
      
      30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
          from Gao Feng.
      
      31) Register SYNPROXY hooks on demand, from Florian Westphal.
      
      32) Use pernet hook whenever possible, instead of global hook
          registration, from Florian Westphal.
      
      33) Pass hook structure to ebt_register_table() to consolidate some
          infrastructure code, from Florian Westphal.
      
      34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
          SYNPROXY code, to make sure device stats are not fooled, patch
          from Gao Feng.
      
      35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
          don't need anymore if we just select a fixed size instead of
          expensive runtime time calculation of this. From Florian.
      
      36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
          from Florian.
      
      37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
          Florian.
      
      38) Attach NAT extension on-demand from masquerade and pptp helper
          path, from Florian.
      
      39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
      
      40) Speed up netns by selective calls of synchronize_net(), from
          Florian Westphal.
      
      41) Silence stack size warning gcc in 32-bit arch in snmp helper,
          from Florian.
      
      42) Inconditionally call nf_ct_ext_destroy(), even if we have no
          extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
          Liping Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a01aa920
    • David S. Miller's avatar
      Merge branch 'bpf-samples-skb_mode-bug-fixes' · edd7f4ef
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching
      
      Two small bugfixes for:
       commit 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edd7f4ef
    • Jesper Dangaard Brouer's avatar
      samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel · f76254a8
      Jesper Dangaard Brouer authored
      The xdp_tx_iptunnel program can be terminated in two ways, after
      N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
      handle detatching the correct XDP program, in-case the program
      was attached with XDP_FLAGS_SKB_MODE.
      
      Fix this by storing the XDP flags as a global variable, which is
      available for the SIGINT handler function.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f76254a8
    • Jesper Dangaard Brouer's avatar
      samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int · 6387d011
      Jesper Dangaard Brouer authored
      The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
      IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
      samples/bpf/ should use the correct type.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6387d011
    • David S. Miller's avatar
      Merge branch 'xdp-netlink-ext-ack' · d74a32ac
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      xdp: use netlink extended ACK reporting
      
      This series is an attempt to make XDP more user friendly by
      enabling exploiting the recently added netlink extended ACK
      reporting to carry messages to user space.
      
      David Ahern's iproute2 ext ack patches for ip link are sufficient
      to show the errors like this:
      
      Error: nfp: MTU too large w/ XDP enabled
      
      Where the message is coming directly from the driver.  There could
      still be a bit of a leap for a complete novice from the message
      above to the right settings, but it's a big improvement over the
      standard "Invalid argument" message.
      
      v1/non-rfc:
       - add a separate macro in patch 1;
       - add KBUILD_MODNAME as part of the message (Daniel);
       - don't print the error to logs in patch 1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d74a32ac
    • Jakub Kicinski's avatar
      virtio_net: make use of extended ack message reporting · 9861ce03
      Jakub Kicinski authored
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9861ce03
    • Jakub Kicinski's avatar
      nfp: make use of extended ack message reporting · d957c0f7
      Jakub Kicinski authored
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d957c0f7
    • Jakub Kicinski's avatar
      xdp: propagate extended ack to XDP setup · ddf9f970
      Jakub Kicinski authored
      Drivers usually have a number of restrictions for running XDP
      - most common being buffer sizes, LRO and number of rings.
      Even though some drivers try to be helpful and print error
      messages experience shows that users don't often consult
      kernel logs on netlink errors.  Try to use the new extended
      ack mechanism to carry the message back to user space.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddf9f970
    • Jakub Kicinski's avatar
      netlink: add NULL-friendly helper for setting extended ACK message · 45d9b378
      Jakub Kicinski authored
      As we propagate extended ack reporting throughout various paths in
      the kernel it may be that the same function is called with the
      extended ack parameter passed as NULL.  One place where that happens
      is in drivers which have a centralized reconfiguration function
      called both from ndos and from ethtool_ops.  Add a new helper for
      setting the error message in such conditions.
      
      Existing helper is left as is to encourage propagating the ext act
      fully wherever possible.  It also makes it clear in the code which
      messages may be lost due to ext ack being NULL.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45d9b378
    • Liping Zhang's avatar
      netfilter: nf_ct_ext: invoke destroy even when ext is not attached · 8eeef235
      Liping Zhang authored
      For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
      then remove it from the nat_bysource_table via nat_extend->destroy.
      
      But now, the nat extension is attached on demand, so if the nat extension
      is not attached, we will not be notified when the ct is destroyed, i.e.
      we may fail to remove ct from the nat_bysource_table.
      
      So just keep it simple, even if the extension is not attached, we will
      still invoke the related ext->destroy. And this will also preserve the
      flexibility for the future extension.
      
      Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8eeef235
    • Pablo Neira Ayuso's avatar
      Merge tag 'ipvs3-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · d1908ca8
      Pablo Neira Ayuso authored
      Simon Horman says:
      
      ====================
      Third Round of IPVS Updates for v4.12
      
      please consider these enhancements to IPVS for v4.12.
      If it is too late for v4.12 then please consider them for v4.13.
      
      * Remove unused function
      * Correct comparison of unsigned value
      ====================
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1908ca8
    • Florian Westphal's avatar
      netfilter: snmp: avoid stack size warning · 0e72f55f
      Florian Westphal authored
      net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
      of 1160 bytes is larger than 1024 bytes
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0e72f55f
    • Florian Westphal's avatar
      netfilter: nf_queue: only call synchronize_net twice if nf_queue is active · 039b40ee
      Florian Westphal authored
      nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
      provided there is no nfqueue active in that net namespace (which is
      the common case).
      
      This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
      this gets called during netns cleanup so no packets should be queued.
      
      For the rare case of base chain being unregistered or module removal
      while nfqueue is in use the extra hiccup due to the packet drops isn't
      a big deal.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      039b40ee
    • Florian Westphal's avatar
      netfilter: nf_log: don't call synchronize_rcu in nf_log_unset · c83fa196
      Florian Westphal authored
      nf_log_unregister() (which is what gets called in the logger backends
      module exit paths) does a (required, module is removed) synchronize_rcu().
      
      But nf_log_unset() is only called from pernet exit handlers. It doesn't
      free any memory so there appears to be no need to call synchronize_rcu.
      
      v2: Liping Zhang points out that nf_log_unregister() needs to be called
      after pernet unregister, else rmmod would become unsafe.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c83fa196
    • Florian Westphal's avatar
      netfilter: batch synchronize_net calls during hook unregister · 933bd83e
      Florian Westphal authored
      synchronize_net is expensive and slows down netns cleanup a lot.
      
      We have two APIs to unregister a hook:
      nf_unregister_net_hook (which calls synchronize_net())
      and
      nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)
      
      Make nf_unregister_net_hook a wapper around new helper
      __nf_unregister_net_hook, which unlinks the hook but does not free it.
      
      Then, we can call that helper in nf_unregister_net_hooks and then
      call synchronize_net() only once.
      
      Andrey Konovalov reports this change improves syzkaller fuzzing speed at
      least twice.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      933bd83e
    • Abhishek Shah's avatar
      net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay · 73333626
      Abhishek Shah authored
      This patch allows users to enable/disable internal TX and/or RX
      clock delay for BCM5481x series PHYs so as to satisfy RGMII timing
      specifications.
      
      On a particular platform, whether TX and/or RX clock delay is required
      depends on how PHY connected to the MAC IP. This requirement can be
      specified through "phy-mode" property in the platform device tree.
      Signed-off-by: default avatarAbhishek Shah <abhishek.shah@broadcom.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73333626
    • Colin Ian King's avatar
      net: sunhme: fix spelling mistakes: "ParityErro" -> "ParityError" · d8325650
      Colin Ian King authored
      trivial fix to spelling mistakes in printk message.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8325650
    • Scott Wood's avatar
      bnx2x: Align RX buffers · 9b70de6d
      Scott Wood authored
      The bnx2x driver is not providing proper alignment on the receive buffers it
      passes to build_skb(), causing skb_shared_info to be misaligned.
      skb_shared_info contains an atomic, and while PPC normally supports
      unaligned accesses, it does not support unaligned atomics.
      
      Aligning the size of rx buffers will ensure that page_frag_alloc() returns
      aligned addresses.
      
      This can be reproduced on PPC by setting the network MTU to 1450 (or other
      non-multiple-of-4) and then generating sufficient inbound network traffic
      (one or two large "wget"s usually does it), producing the following oops:
      
      Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656
      Faulting instruction address: 0xc00000000080ef8c
      Oops: Kernel access of bad area, sig: 7 [#1]
      SMP NR_CPUS=2048
      NUMA
      PowerNV
      Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common
      CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761daf #2
      task: c00000ffd4892400 task.stack: c00000ffd4920000
      NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320
      REGS: c00000ffffc33710 TRAP: 0600   Not tainted  (4.11.0-rc8-00088-g4c761daf)
      MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
        CR: 24082042  XER: 00000000
      CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1
      GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100
      GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000
      GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000
      GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f
      GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001
      GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000
      GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a
      GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100
      NIP [c00000000080ef8c] __skb_clone+0xdc/0x140
      LR [c00000000080eee8] __skb_clone+0x38/0x140
      Call Trace:
      [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable)
      [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510
      [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80
      [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0
      [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260
      [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x]
      [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480
      [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0
      [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120
      [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200
      [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24
      [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110
      [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b70de6d
    • Arkadi Sharshevsky's avatar
      net: bridge: Fix improper taking over HW learned FDB · 58073b32
      Arkadi Sharshevsky authored
      Commit 7e26bf45 ("net: bridge: allow SW learn to take over HW fdb
      entries") added the ability to "take over an entry which was previously
      learned via HW when it shows up from a SW port".
      
      However, if an entry was learned via HW and then a control packet
      (e.g., ARP request) was trapped to the CPU, the bridge driver will
      update the entry and remove the externally learned flag, although the
      entry is still present in HW. Instead, only clear the externally learned
      flag in case of roaming.
      
      Fixes: 7e26bf45 ("net: bridge: allow SW learn to take over HW fdb entries")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarArkadi Sharashevsky <arkadis@mellanox.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58073b32
    • WANG Cong's avatar
      ipv4: get rid of ip_ra_lock · ba3f571d
      WANG Cong authored
      After commit 1215e51e ("ipv4: fix a deadlock in ip_ra_control")
      we always take RTNL lock for ip_ra_control() which is the only place
      we update the list ip_ra_chain, so the ip_ra_lock is no longer needed.
      
      As Eric points out, BH does not need to disable either, RCU readers
      don't care.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba3f571d
    • Jesper Dangaard Brouer's avatar
      samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong · 5010e948
      Jesper Dangaard Brouer authored
      The struct bpf_map_def was extended in commit fb30d4b7 ("bpf: Add tests
      for map-in-map") with member unsigned int inner_map_idx.  This changed the size
      of the maps section in the generated ELF _kern.o files.
      
      Unfortunately the loader in bpf_load.c does not detect or handle this.  Thus,
      older _kern.o files became incompatible, and caused hard-to-debug errors
      where the syscall validation rejected BPF_MAP_CREATE request.
      
      This patch only detect the situation and aborts load_bpf_file(). It also
      add code comments warning people that read this loader for inspiration
      for these pitfalls.
      
      Fixes: fb30d4b7 ("bpf: Add tests for map-in-map")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5010e948
    • Dan Carpenter's avatar
      lwtunnel: fix error path in lwtunnel_fill_encap() · 39f37095
      Dan Carpenter authored
      We recently added a check to see if nla_nest_start() fails.  There are
      two issues with that.  First, if it fails then I don't think we should
      call nla_nest_cancel().  Second, it's slightly convoluted but the
      current code returns success but we should return -EMSGSIZE instead.
      
      Fixes: a50fe0ff ("lwtunnel: check return value of nla_nest_start")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f37095
    • Dan Carpenter's avatar
      liquidio: silence a locking static checker warning · 77041e89
      Dan Carpenter authored
      Presumably we never hit this return, but static checkers complain that
      we need to unlock so we may as well fix that.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77041e89
    • Dan Carpenter's avatar
      qed: Unlock on error in qed_vf_pf_acquire() · 66117a9d
      Dan Carpenter authored
      My static checker complains that we're holding a mutex on this error
      path.  Let's goto exit instead of returning directly.
      
      Fixes: b0bccb69 ("qed: Change locking scheme for VF channel")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66117a9d
    • David S. Miller's avatar
      Merge branch 'hns-deferred-probe' · 1c942c94
      David S. Miller authored
      lipeng says:
      
      ====================
      net: hns: bug fix for HNS driver
      
      This patchset add support defered dsaf probe when mdio and
      mbigen module is not insmod.
      
      For more details, please refer to individual patch.
      
      change log:
      V4 - > V5:
      1. Float on net-next;
      2. Delete patch "net: hns: fixed bug that skb used after kfree"
         from this patchset;
      
      V3 -> V4:
      1. Delete redundant commit message;
      2. Add Reviewed-by: Matthias Brugger <mbrugger@suse.com>;
      
      V2 -> V3:
      1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
      
      V1 -> V2:
      1. Return appropriate errno in hns_mac_register_phy;
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c942c94
    • lipeng's avatar
      net: hns: support deferred probe when no mdio · 804ffe5c
      lipeng authored
      In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
      module is probed with module_init, and, as such,
      is not guaranteed to probe before the HNS driver. So we need
      to support deferred probe.
      
      We check for probe deferral in the mac init, so we not init DSAF
      when there is no mdio, and free all resource, to later learn that
      we need to defer the probe.
      Signed-off-by: default avatarlipeng <lipeng321@huawei.com>
      Reviewed-by: default avatarYisen Zhuang <yisen.zhuang@huawei.com>
      Reviewed-by: default avatarMatthias Brugger <mbrugger@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804ffe5c
    • lipeng's avatar
      net: hns: support deferred probe when can not obtain irq · 2fdd6baf
      lipeng authored
      In the hip06 and hip07 SoCs, the interrupt lines from the
      DSAF controllers are connected to mbigen hw module.
      The mbigen module is probed with module_init, and, as such,
      is not guaranteed to probe before the HNS driver. So we need
      to support deferred probe.
      Signed-off-by: default avatarlipeng <lipeng321@huawei.com>
      Reviewed-by: default avatarYisen Zhuang <yisen.zhuang@huawei.com>
      Reviewed-by: default avatarMatthias Brugger <mbrugger@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fdd6baf
    • David S. Miller's avatar
      Merge branch 'nfp-XDP_TX-optimizations' · ba1d82e6
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: optimize XDP TX and small fixes
      
      This series optimizes the nfp XDP TX performance a little bit.
      I run quick tests on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
      Single core/queue performance for both touch and drop and touch and
      forward is above 20Mpps @64B packets, drop being 2Mpps faster.
      I think this is max for a single queue on the low power NFPs.
      
      There are also a few minor fixes included for code in net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba1d82e6
    • Jakub Kicinski's avatar
      nfp: provide 256 bytes of XDP headroom in all configurations · dbf637ff
      Jakub Kicinski authored
      For legacy reasons NFP FW may be compiled to DMA packets to a constant
      offset into the buffer and use the space before it for metadata.  This
      ensures that packets data always start at a certain offset regardless of
      the amount of preceding metadata.
      
      If rx offset is set to 0 there may still be up to 64 bytes of metadata
      but metadata will start at the beginning of the buffer, instead of:
      
          data_start_offset = rx_offset - meta_len
      
      Even though we make the buffers larger to accommodate up to 64 bytes of
      metadata, if there is only N bytes of metadata, we will end up with
      N bytes of headroom and 64 - N bytes of tailroom.  Therefore we can't
      rely on that space for XDP headroom.  Make sure we always allocate
      full 256 bytes.  This, unfortunately, means we can't fit the headroom
      on an u8 any more.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbf637ff
    • Jakub Kicinski's avatar
      nfp: don't completely refuse to work with old flashes · 85cb207e
      Jakub Kicinski authored
      Right now the required Service Process ABI version is still tied
      to max ID of known commands.  For new NSP commands we are adding
      we are checking if NSP version is recent enough on command-by-command
      basis.  The driver doesn't have to force the device to have the
      very latest flash, anything newer than 0.8 should do.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85cb207e
    • Jakub Kicinski's avatar
      nfp: avoid reading TX queue indexes from the device · d38df0d3
      Jakub Kicinski authored
      Reading TX queue indexes from the device memory on each interrupt
      is expensive.  It's doubly expensive with XDP running since we have
      two TX rings to check there.  If the software indexes indicate that
      the TX queue is completely empty, however, we don't need to look at
      the device completion index at all.
      
      The queuing CPU is doing a wmb() before kicking the device TX so
      we should be safe to assume on the CPU handling the completions will
      never see old value of the software copy of the index.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d38df0d3
    • Jakub Kicinski's avatar
      nfp: do simple XDP TX buffer recycling · 92e68195
      Jakub Kicinski authored
      On the RX path we follow the "drop if allocation of replacement
      buffer fails" rule.  With XDP we extended that to the TX action,
      so if XDP prog returned TX but allocation of replacement RX buffer
      failed, we will drop the packet.
      
      To improve our XDP TX performance extend the idea of rings being
      always full to XDP TX rings.  Pre-fill the XDP TX rings with RX
      buffers, and when XDP prog returns TX action swap the RX buffer
      with the next buffer from the TX ring.
      
      XDP TX complete will no longer free the buffers but let them
      sit on the TX ring and wait for swap with RX buffer, instead.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92e68195
    • Jakub Kicinski's avatar
      nfp: drop rx_ring param from buffer allocation · d78005a5
      Jakub Kicinski authored
      We will soon allocate RX buffers for caching on XDP TX rings.
      The rx_ring parameter passed to nfp_net_rx_alloc_one() is not
      actually used, remove it.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d78005a5
    • Jakub Kicinski's avatar
      nfp: replace -ENOTSUPP with -EOPNOTSUPP · 46c50518
      Jakub Kicinski authored
      As Or points out in commit 423b3aec ("net/mlx4: Change ENOTSUPP
      to EOPNOTSUPP"), ENOTSUPP is NFS specific error.  Replace it with
      EOPNOTSUPP.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46c50518
    • Willem de Bruijn's avatar
      virtio-net: use netif_tx_napi_add for tx napi · 1d11e732
      Willem de Bruijn authored
      Avoid hashing the tx napi struct into napi_hash[], which is used for
      busy polling receive queues.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d11e732
    • David Howells's avatar
      net: Initialise init_net.count to 1 · b5082df8
      David Howells authored
      Initialise init_net.count to 1 for its pointer from init_nsproxy lest
      someone tries to do a get_net() and a put_net() in a process in which
      current->ns_proxy->net_ns points to the initial network namespace.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5082df8
    • Girish Moodalbail's avatar
      geneve: fix incorrect setting of UDP checksum flag · 5e0740c4
      Girish Moodalbail authored
      Creating a geneve link with 'udpcsum' set results in a creation of link
      for which UDP checksum will NOT be computed on outbound packets, as can
      be seen below.
      
      11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
          link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0
          geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum
      
      Similarly, creating a link with 'noudpcsum' set results in a creation
      of link for which UDP checksum will be computed on outbound packets.
      
      Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Acked-by: default avatarLance Richardson <lrichard@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e0740c4
    • David S. Miller's avatar
      Merge branch 'vxlan-disabled-ipv6' · c4879789
      David S. Miller authored
      Jiri Benc says:
      
      ====================
      vxlan: do not error out on disabled IPv6
      
      This patchset fixes a bug with metadata based tunnels when booted with
      ipv6.disable=1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4879789
    • Jiri Benc's avatar
      vxlan: do not output confusing error message · baf4d786
      Jiri Benc authored
      The message "Cannot bind port X, err=Y" creates only confusion. In metadata
      based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
      no error message should be printed. But when IPv6 tunnel was requested, such
      failure is fatal. The vxlan_socket_create does not know when the error is
      harmless and when it's not.
      
      Instead of passing such information down to vxlan_socket_create, remove the
      message completely. It's not useful. We propagate the error code up to the
      user space and the port number comes from the user space. There's nothing in
      the message that the process creating vxlan interface does not know.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baf4d786
    • Jiri Benc's avatar
      vxlan: correctly handle ipv6.disable module parameter · d074bf96
      Jiri Benc authored
      When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
      -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
      operation of bringing up the tunnel.
      
      Ignore failure of IPv6 socket creation for metadata based tunnels caused by
      IPv6 not being available.
      
      Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d074bf96