1. 01 May, 2017 40 commits
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: add max VID to info · 3cf3c846
      Vivien Didelot authored
      Some chips don't have a VLAN Table Unit, most of them do have a 4K
      table, some others as the 88E6390 family has a 13th bit for the VID.
      
      Add a new max_vid member to the info structure, used to check the
      presence of a VTU as well as the value used to iterate from in VTU
      GetNext operations.
      
      This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cf3c846
    • Ilan Tayari's avatar
      xfrm: Indicate xfrm_state offload errors · 152afb9b
      Ilan Tayari authored
      Current code silently ignores driver errors when configuring
      IPSec offload xfrm_state, and falls back to host-based crypto.
      
      Fail the xfrm_state creation if the driver has an error, because
      the NIC offloading was explicitly requested by the user program.
      
      This will communicate back to the user that there was an error.
      
      Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      152afb9b
    • Ilan Tayari's avatar
      net/esp4: Fix invalid esph pointer crash · 67d349ed
      Ilan Tayari authored
      Both esp_output and esp_xmit take a pointer to the ESP header
      and place it in esp_info struct prior to calling esp_output_head.
      
      Inside esp_output_head, the call to esp_output_udp_encap
      makes sure to update the pointer if it gets invalid.
      However, if esp_output_head itself calls skb_cow_data, the
      pointer is not updated and stays invalid, causing a crash
      after esp_output_head returns.
      
      Update the pointer if it becomes invalid in esp_output_head
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67d349ed
    • Craig Gallek's avatar
      ip6_tunnel: Fix missing tunnel encapsulation limit option · 89a23c8b
      Craig Gallek authored
      The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
      IPV6_TLV_PADN options when an encapsulation limit is defined (the
      default is a limit of 4).  An MTU adjustment is done to account for
      these options as well.  However, the options are never present in the
      generated packets.
      
      The issue appears to be a subtlety between IPV6_DSTOPTS and
      IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
      written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
      dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
      (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
      are to be used.  This caused the options to no longer be included in v6
      encapsulated packets.
      
      The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
      instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
      
      Fixes: 1df64a85: ("[IPV6]: Add ip6ip6 tunnel driver.")
      Fixes: 333fad53: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89a23c8b
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · f9ed236c
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2017-04-30
      
      Here's one last batch of Bluetooth patches in the bluetooth-next tree
      targeting the 4.12 kernel.
      
       - Remove custom ECDH implementation and use new KPP API instead
       - Add protocol checks to hci_ldisc
       - Add module license to HCI UART Nokia H4+ driver
       - Minor fix for 32bit user space - 64 bit kernel combination
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9ed236c
    • Liam Beguin's avatar
      switchdev: documentation: fix whitespace issues · d5066c46
      Liam Beguin authored
      Figure 1 is full of whitespaces; fix it
      Signed-off-by: default avatarLiam Beguin <lbeguin@tycoint.com>
      Signed-off-by: default avatarSylvain Lemieux <slemieux@tycoint.com>
      Acked-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5066c46
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Simplify VRF enslavement · b1e45526
      Ido Schimmel authored
      When a netdev is enslaved to a VRF master, its router interface (RIF)
      needs to be destroyed (if exists) and a new one created using the
      corresponding virtual router (VR).
      
      >From the driver's perspective, the above is equivalent to an inetaddr
      event sent for this netdev. Therefore, when a port netdev (or its
      uppers) are enslaved to a VRF master, call the same function that
      would've been called had a NETDEV_UP was sent for this netdev in the
      inetaddr notification chain.
      
      This patch also fixes a bug when a LAG netdev with an existing RIF is
      enslaved to a VRF. Before this patch, each LAG port would drop the
      reference on the RIF, but would re-join the same one (in the wrong VR)
      soon after. With this patch, the corresponding RIF is first destroyed
      and a new one is created using the correct VR.
      
      Fixes: 7179eb5a ("mlxsw: spectrum_router: Add support for VRFs")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1e45526
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · cedf90c0
      David S. Miller authored
      mlx5-updates-2017-04-30
      
      Or says:
      ================
      mlx5 neigh update
      
      This series (whose code name is 'neigh update') from Hadar, enhances the
      mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
      neighbours used in offloaded flows which involved encapsulation.
      
      In order to keep track on the validity state of such neighbours, we register
      a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
      becomes valid, offload the related flows to HW (the other way around when
      neigh becomes invalid) and similarly when a neigh mac addresses changes.
      
      Since this traffic is offloaded from the host OS, the neighbour for the IP
      tunnel destination can mistakenly become STALE and deleted by the kernel
      since its 'used' value wasn't changed. To address that, we proactively
      update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
      time stamps generated by the existing driver code for HW flow counters.
      We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.
      
      Prior to the core of the series, there's a patch from Saeed that introduces an
      extendable vport representor implementation scheme. It provides a separation
      between the eswitch to the netdev related aspects of the representors.
      
      We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
      through the long design and review cycles while we struggled to understand and
      (hopefully correctly) implement the locking around the different driver flows(..) .
      
      - Or.
      =================
      
      Misc Updates:
      
      From Tariq:
      Some small performance and trivial code optimization for mlx5 netdev driver
      - Optimize poll ICOSQ completion queue
      - Use prefetchw when a write is to follow
      - Use u8 as ownership type in mlx5e_get_cqe()
      
      From Eran:
      - Disable LRO by default on specific setups
      
      From Eli:
      - Small cleanup for E-Switch to avoid redundant allocation
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cedf90c0
    • Mintz, Yuval's avatar
      qed: Prevent warning without CONFIG_RFS_ACCEL · 07ff2ed0
      Mintz, Yuval authored
      After removing the PTP related initialization from slowpath start,
      the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
      Otherwise, it leads to a warning due to it being unused.
      
      Fixes: d179bd16 ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07ff2ed0
    • David S. Miller's avatar
      Merge branch 'qed-RoCE-fixes' · a6e8ab8e
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      qed: RoCE related pseudo-fixes
      
      This series contains multiple small corrections to the RoCE logic
      in qed plus some debug information and inter-module parameter
      meant to prevent issues further along.
      
       - #1, #6 Share information with protocol driver
         [either new or filling missing bits in existing API].
       - #2, #3 correct error flows in qed.
       - #4 add debug related information.
       - #5 fixes a minor issue in the HW configuration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e8ab8e
    • Ram Amrani's avatar
      qed: output the DPM status and WID count · 20b1bd96
      Ram Amrani authored
      Output to the RDMA driver whether DPM mode is enabled or disabled in
      the HW and if so what is the number of WIDs it supports
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20b1bd96
    • Ram Amrani's avatar
      qed: align DPI configuration to HW requirements · 107392b7
      Ram Amrani authored
      When calculating doorbell BAR partitioning round up the number of
      CPUs to the nearest power of 2 so the size of the DPI (per user
      section) configured in the hardware will be stored properly and
      not truncated.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      107392b7
    • Ram Amrani's avatar
      qed: verify RoCE resource bitmaps are released · e015d58b
      Ram Amrani authored
      Add mechanism to verify RoCE resources are released prior to freeing the
      bitmaps. If this is not the case, print what resources were not released.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e015d58b
    • Ram Amrani's avatar
      qed: add error handling flow to TID deregistratin posting failure · 10536194
      Ram Amrani authored
      If the posting of the ramrod for the purpose of TID deregistration
      fails, abort the deregistration operation without using the FW's
      return code.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10536194
    • Ram Amrani's avatar
      qed: remove unused SQ error state · ba0154e9
      Ram Amrani authored
      The internal RoCE SQE QP state isn't being used. Instead we mark the
      QP as in regular error state.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba0154e9
    • Ram Amrani's avatar
      793ea8a9
    • Yonghong Song's avatar
      bpf: enhance verifier to understand stack pointer arithmetic · 332270fd
      Yonghong Song authored
      llvm 4.0 and above generates the code like below:
      ....
      440: (b7) r1 = 15
      441: (05) goto pc+73
      515: (79) r6 = *(u64 *)(r10 -152)
      516: (bf) r7 = r10
      517: (07) r7 += -112
      518: (bf) r2 = r7
      519: (0f) r2 += r1
      520: (71) r1 = *(u8 *)(r8 +0)
      521: (73) *(u8 *)(r2 +45) = r1
      ....
      and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
      This is because verifier marks register r2 as unknown value after #519
      where r2 is a stack pointer and r1 holds a constant value.
      
      Teach verifier to recognize "stack_ptr + imm" and
      "stack_ptr + reg with const val" as valid stack_ptr with new offset.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      332270fd
    • Karim Eshapa's avatar
      benet: Use time_before_eq for time comparison · 2faf2657
      Karim Eshapa authored
      Use time_before_eq for time comparison more safe and dealing
      with timer wrapping to be future-proof.
      Signed-off-by: default avatarKarim Eshapa <karim.eshapa@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2faf2657
    • Benjamin LaHaise's avatar
      flower: check unused bits in MPLS fields · 1a7fca63
      Benjamin LaHaise authored
      Since several of the the netlink attributes used to configure the flower
      classifier's MPLS TC, BOS and Label fields have additional bits which are
      unused, check those bits to ensure that they are actually 0 as suggested
      by Jamal.
      Signed-off-by: default avatarBenjamin LaHaise <benjamin.lahaise@netronome.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a7fca63
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · a01aa920
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. A large bunch of code cleanups, simplify the conntrack extension
      codebase, get rid of the fake conntrack object, speed up netns by
      selective synchronize_net() calls. More specifically, they are:
      
      1) Check for ct->status bit instead of using nfct_nat() from IPVS and
         Netfilter codebase, patch from Florian Westphal.
      
      2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
      
      3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
      
      4) Introduce nft_is_base_chain() helper function.
      
      5) Enforce expectation limit from userspace conntrack helper,
         from Gao Feng.
      
      6) Add nf_ct_remove_expect() helper function, from Gao Feng.
      
      7) NAT mangle helper function return boolean, from Gao Feng.
      
      8) ctnetlink_alloc_expect() should only work for conntrack with
         helpers, from Gao Feng.
      
      9) Add nfnl_msg_type() helper function to nfnetlink to build the
         netlink message type.
      
      10) Get rid of unnecessary cast on void, from simran singhal.
      
      11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
          also from simran singhal.
      
      12) Use list_prev_entry() from nf_tables, from simran signhal.
      
      13) Remove unnecessary & on pointer function in the Netfilter and IPVS
          code.
      
      14) Remove obsolete comment on set of rules per CPU in ip6_tables,
          no longer true. From Arushi Singhal.
      
      15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
      
      16) Remove unnecessary nested rcu_read_lock() in
          __nf_nat_decode_session(). Code running from hooks are already
          guaranteed to run under RCU read side.
      
      17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
      
      18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
          also from Aaron.
      
      19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
      
      20) Don't propagate NF_DROP error to userspace via ctnetlink in
          __nf_nat_alloc_null_binding() function, from Gao Feng.
      
      21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
          from Gao Feng.
      
      22) Kill the fake untracked conntrack objects, use ctinfo instead to
          annotate a conntrack object is untracked, from Florian Westphal.
      
      23) Remove nf_ct_is_untracked(), now obsolete since we have no
          conntrack template anymore, from Florian.
      
      24) Add event mask support to nft_ct, also from Florian.
      
      25) Move nf_conn_help structure to
          include/net/netfilter/nf_conntrack_helper.h.
      
      26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
          Thus, we don't deal with variable conntrack extensions anymore.
          Make sure userspace conntrack helper doesn't go over that size.
          Remove variable size ct extension infrastructure now this code
          got no more clients. From Florian Westphal.
      
      27) Restore offset and length of nf_ct_ext structure to 8 bytes now
          that wraparound is not possible any longer, also from Florian.
      
      28) Allow to get rid of unassured flows under stress in conntrack,
          this applies to DCCP, SCTP and TCP protocols, from Florian.
      
      29) Shrink size of nf_conntrack_ecache structure, from Florian.
      
      30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
          from Gao Feng.
      
      31) Register SYNPROXY hooks on demand, from Florian Westphal.
      
      32) Use pernet hook whenever possible, instead of global hook
          registration, from Florian Westphal.
      
      33) Pass hook structure to ebt_register_table() to consolidate some
          infrastructure code, from Florian Westphal.
      
      34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
          SYNPROXY code, to make sure device stats are not fooled, patch
          from Gao Feng.
      
      35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
          don't need anymore if we just select a fixed size instead of
          expensive runtime time calculation of this. From Florian.
      
      36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
          from Florian.
      
      37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
          Florian.
      
      38) Attach NAT extension on-demand from masquerade and pptp helper
          path, from Florian.
      
      39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
      
      40) Speed up netns by selective calls of synchronize_net(), from
          Florian Westphal.
      
      41) Silence stack size warning gcc in 32-bit arch in snmp helper,
          from Florian.
      
      42) Inconditionally call nf_ct_ext_destroy(), even if we have no
          extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
          Liping Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a01aa920
    • David S. Miller's avatar
      Merge branch 'bpf-samples-skb_mode-bug-fixes' · edd7f4ef
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching
      
      Two small bugfixes for:
       commit 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edd7f4ef
    • Jesper Dangaard Brouer's avatar
      samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel · f76254a8
      Jesper Dangaard Brouer authored
      The xdp_tx_iptunnel program can be terminated in two ways, after
      N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
      handle detatching the correct XDP program, in-case the program
      was attached with XDP_FLAGS_SKB_MODE.
      
      Fix this by storing the XDP flags as a global variable, which is
      available for the SIGINT handler function.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f76254a8
    • Jesper Dangaard Brouer's avatar
      samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int · 6387d011
      Jesper Dangaard Brouer authored
      The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
      IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
      samples/bpf/ should use the correct type.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6387d011
    • David S. Miller's avatar
      Merge branch 'xdp-netlink-ext-ack' · d74a32ac
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      xdp: use netlink extended ACK reporting
      
      This series is an attempt to make XDP more user friendly by
      enabling exploiting the recently added netlink extended ACK
      reporting to carry messages to user space.
      
      David Ahern's iproute2 ext ack patches for ip link are sufficient
      to show the errors like this:
      
      Error: nfp: MTU too large w/ XDP enabled
      
      Where the message is coming directly from the driver.  There could
      still be a bit of a leap for a complete novice from the message
      above to the right settings, but it's a big improvement over the
      standard "Invalid argument" message.
      
      v1/non-rfc:
       - add a separate macro in patch 1;
       - add KBUILD_MODNAME as part of the message (Daniel);
       - don't print the error to logs in patch 1.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d74a32ac
    • Jakub Kicinski's avatar
      virtio_net: make use of extended ack message reporting · 9861ce03
      Jakub Kicinski authored
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9861ce03
    • Jakub Kicinski's avatar
      nfp: make use of extended ack message reporting · d957c0f7
      Jakub Kicinski authored
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d957c0f7
    • Jakub Kicinski's avatar
      xdp: propagate extended ack to XDP setup · ddf9f970
      Jakub Kicinski authored
      Drivers usually have a number of restrictions for running XDP
      - most common being buffer sizes, LRO and number of rings.
      Even though some drivers try to be helpful and print error
      messages experience shows that users don't often consult
      kernel logs on netlink errors.  Try to use the new extended
      ack mechanism to carry the message back to user space.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddf9f970
    • Jakub Kicinski's avatar
      netlink: add NULL-friendly helper for setting extended ACK message · 45d9b378
      Jakub Kicinski authored
      As we propagate extended ack reporting throughout various paths in
      the kernel it may be that the same function is called with the
      extended ack parameter passed as NULL.  One place where that happens
      is in drivers which have a centralized reconfiguration function
      called both from ndos and from ethtool_ops.  Add a new helper for
      setting the error message in such conditions.
      
      Existing helper is left as is to encourage propagating the ext act
      fully wherever possible.  It also makes it clear in the code which
      messages may be lost due to ext ack being NULL.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45d9b378
    • Liping Zhang's avatar
      netfilter: nf_ct_ext: invoke destroy even when ext is not attached · 8eeef235
      Liping Zhang authored
      For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
      then remove it from the nat_bysource_table via nat_extend->destroy.
      
      But now, the nat extension is attached on demand, so if the nat extension
      is not attached, we will not be notified when the ct is destroyed, i.e.
      we may fail to remove ct from the nat_bysource_table.
      
      So just keep it simple, even if the extension is not attached, we will
      still invoke the related ext->destroy. And this will also preserve the
      flexibility for the future extension.
      
      Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8eeef235
    • Pablo Neira Ayuso's avatar
      Merge tag 'ipvs3-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · d1908ca8
      Pablo Neira Ayuso authored
      Simon Horman says:
      
      ====================
      Third Round of IPVS Updates for v4.12
      
      please consider these enhancements to IPVS for v4.12.
      If it is too late for v4.12 then please consider them for v4.13.
      
      * Remove unused function
      * Correct comparison of unsigned value
      ====================
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1908ca8
    • Florian Westphal's avatar
      netfilter: snmp: avoid stack size warning · 0e72f55f
      Florian Westphal authored
      net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
      of 1160 bytes is larger than 1024 bytes
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0e72f55f
    • Florian Westphal's avatar
      netfilter: nf_queue: only call synchronize_net twice if nf_queue is active · 039b40ee
      Florian Westphal authored
      nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
      provided there is no nfqueue active in that net namespace (which is
      the common case).
      
      This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
      this gets called during netns cleanup so no packets should be queued.
      
      For the rare case of base chain being unregistered or module removal
      while nfqueue is in use the extra hiccup due to the packet drops isn't
      a big deal.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      039b40ee
    • Florian Westphal's avatar
      netfilter: nf_log: don't call synchronize_rcu in nf_log_unset · c83fa196
      Florian Westphal authored
      nf_log_unregister() (which is what gets called in the logger backends
      module exit paths) does a (required, module is removed) synchronize_rcu().
      
      But nf_log_unset() is only called from pernet exit handlers. It doesn't
      free any memory so there appears to be no need to call synchronize_rcu.
      
      v2: Liping Zhang points out that nf_log_unregister() needs to be called
      after pernet unregister, else rmmod would become unsafe.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c83fa196
    • Florian Westphal's avatar
      netfilter: batch synchronize_net calls during hook unregister · 933bd83e
      Florian Westphal authored
      synchronize_net is expensive and slows down netns cleanup a lot.
      
      We have two APIs to unregister a hook:
      nf_unregister_net_hook (which calls synchronize_net())
      and
      nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)
      
      Make nf_unregister_net_hook a wapper around new helper
      __nf_unregister_net_hook, which unlinks the hook but does not free it.
      
      Then, we can call that helper in nf_unregister_net_hooks and then
      call synchronize_net() only once.
      
      Andrey Konovalov reports this change improves syzkaller fuzzing speed at
      least twice.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      933bd83e
    • Abhishek Shah's avatar
      net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay · 73333626
      Abhishek Shah authored
      This patch allows users to enable/disable internal TX and/or RX
      clock delay for BCM5481x series PHYs so as to satisfy RGMII timing
      specifications.
      
      On a particular platform, whether TX and/or RX clock delay is required
      depends on how PHY connected to the MAC IP. This requirement can be
      specified through "phy-mode" property in the platform device tree.
      Signed-off-by: default avatarAbhishek Shah <abhishek.shah@broadcom.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73333626
    • Colin Ian King's avatar
      net: sunhme: fix spelling mistakes: "ParityErro" -> "ParityError" · d8325650
      Colin Ian King authored
      trivial fix to spelling mistakes in printk message.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8325650
    • Scott Wood's avatar
      bnx2x: Align RX buffers · 9b70de6d
      Scott Wood authored
      The bnx2x driver is not providing proper alignment on the receive buffers it
      passes to build_skb(), causing skb_shared_info to be misaligned.
      skb_shared_info contains an atomic, and while PPC normally supports
      unaligned accesses, it does not support unaligned atomics.
      
      Aligning the size of rx buffers will ensure that page_frag_alloc() returns
      aligned addresses.
      
      This can be reproduced on PPC by setting the network MTU to 1450 (or other
      non-multiple-of-4) and then generating sufficient inbound network traffic
      (one or two large "wget"s usually does it), producing the following oops:
      
      Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656
      Faulting instruction address: 0xc00000000080ef8c
      Oops: Kernel access of bad area, sig: 7 [#1]
      SMP NR_CPUS=2048
      NUMA
      PowerNV
      Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common
      CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761daf #2
      task: c00000ffd4892400 task.stack: c00000ffd4920000
      NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320
      REGS: c00000ffffc33710 TRAP: 0600   Not tainted  (4.11.0-rc8-00088-g4c761daf)
      MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
        CR: 24082042  XER: 00000000
      CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1
      GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100
      GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000
      GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000
      GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f
      GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001
      GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000
      GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a
      GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100
      NIP [c00000000080ef8c] __skb_clone+0xdc/0x140
      LR [c00000000080eee8] __skb_clone+0x38/0x140
      Call Trace:
      [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable)
      [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510
      [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80
      [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0
      [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260
      [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x]
      [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480
      [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0
      [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120
      [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200
      [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24
      [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110
      [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b70de6d
    • Arkadi Sharshevsky's avatar
      net: bridge: Fix improper taking over HW learned FDB · 58073b32
      Arkadi Sharshevsky authored
      Commit 7e26bf45 ("net: bridge: allow SW learn to take over HW fdb
      entries") added the ability to "take over an entry which was previously
      learned via HW when it shows up from a SW port".
      
      However, if an entry was learned via HW and then a control packet
      (e.g., ARP request) was trapped to the CPU, the bridge driver will
      update the entry and remove the externally learned flag, although the
      entry is still present in HW. Instead, only clear the externally learned
      flag in case of roaming.
      
      Fixes: 7e26bf45 ("net: bridge: allow SW learn to take over HW fdb entries")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarArkadi Sharashevsky <arkadis@mellanox.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58073b32
    • WANG Cong's avatar
      ipv4: get rid of ip_ra_lock · ba3f571d
      WANG Cong authored
      After commit 1215e51e ("ipv4: fix a deadlock in ip_ra_control")
      we always take RTNL lock for ip_ra_control() which is the only place
      we update the list ip_ra_chain, so the ip_ra_lock is no longer needed.
      
      As Eric points out, BH does not need to disable either, RCU readers
      don't care.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba3f571d
    • Jesper Dangaard Brouer's avatar
      samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong · 5010e948
      Jesper Dangaard Brouer authored
      The struct bpf_map_def was extended in commit fb30d4b7 ("bpf: Add tests
      for map-in-map") with member unsigned int inner_map_idx.  This changed the size
      of the maps section in the generated ELF _kern.o files.
      
      Unfortunately the loader in bpf_load.c does not detect or handle this.  Thus,
      older _kern.o files became incompatible, and caused hard-to-debug errors
      where the syscall validation rejected BPF_MAP_CREATE request.
      
      This patch only detect the situation and aborts load_bpf_file(). It also
      add code comments warning people that read this loader for inspiration
      for these pitfalls.
      
      Fixes: fb30d4b7 ("bpf: Add tests for map-in-map")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5010e948