1. 10 Feb, 2022 12 commits
    • Jakub Kicinski's avatar
      selftests: net: cmsg_sender: support icmp and raw sockets · de17e305
      Jakub Kicinski authored
      Support sending fake ICMP(v6) messages and UDP via RAW sockets.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de17e305
    • Jakub Kicinski's avatar
      selftests: net: make cmsg_so_mark ready for more options · 49b78613
      Jakub Kicinski authored
      Parametrize the code so that it can support UDP and ICMP
      sockets in the future, and more cmsg types.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49b78613
    • Jakub Kicinski's avatar
      selftests: net: rename cmsg_so_mark · a086ee24
      Jakub Kicinski authored
      Rename the file in prep for generalization.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a086ee24
    • Jakub Kicinski's avatar
      net: ping6: support setting socket options via cmsg · 3ebb0b10
      Jakub Kicinski authored
      Minor reordering of the code and a call to sock_cmsg_send()
      gives us support for setting the common socket options via
      cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ebb0b10
    • Jakub Kicinski's avatar
      net: ping6: support packet timestamping · e7b06046
      Jakub Kicinski authored
      Nothing prevents the user from requesting timestamping
      on ping6 sockets, yet timestamps are not going to be reported.
      Plumb the flags through.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b06046
    • Jakub Kicinski's avatar
      net: ping6: remove a pr_debug() statement · 42652239
      Jakub Kicinski authored
      We have ftrace and BPF today, there's no need for printing arguments
      at the start of a function.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42652239
    • David S. Miller's avatar
      Merge tag 'ieee802154-for-davem-2022-02-10' of... · 9557167b
      David S. Miller authored
      Merge tag 'ieee802154-for-davem-2022-02-10' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154-next 2022-02-10
      
      An update from ieee802154 for your *net-next* tree.
      
      There is more ongoing in ieee802154 than usual. This will be the first pull
      request for this cycle, but I expect one more. Depending on review and rework
      times.
      
      Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint
      problems as well as making use of the modern api without kmalloc() needs in he
      driver.
      
      Miquel Raynal landed some changes to ensure proper frame checksum checking with
      hwsim, documenting our use of wake and stop_queue and eliding a magic value by
      using the proper define.
      
      David Girault documented the address struct used in ieee802154.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9557167b
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · adc27288
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2022-02-09
      
      This series contains updates to ice driver only.
      
      Brett adds support for QinQ. This begins with code refactoring and
      re-organization of VLAN configuration functions to allow for
      introduction of VSI VLAN ops to enable setting and calling of
      respective operations based on device support of single or double
      VLANs. Implementations are added for outer VLAN support.
      
      To support QinQ, the device must be set to double VLAN mode (DVM).
      In order for this to occur, the DDP package and NVM must also support
      DVM. Functions to determine compatibility and properly configure the
      device are added as well as setting the proper bits to advertise and
      utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2
      is also included to allow for VF to negotiate and utilize this
      functionality.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adc27288
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 45230829
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum,
         from Kevin Mitchell.
      
      2) skb->priority support for nfqueue, from Nicolas Dichtel.
      
      3) Remove conntrack extension register API, from Florian Westphal.
      
      4) Move nat destroy hook to nf_nat_hook instead, to remove
         nf_ct_ext_destroy(), also from Florian.
      
      5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal.
      
      6) Support for tcp option set to noop for nf_tables, also from Florian.
      
      7) Do not run x_tables comment match from packet path in nf_tables,
         from Florian Westphal.
      
      8) Replace spinlock by cmpxchg() loop to update missed ct event,
         from Florian Westphal.
      
      9) Wrap cttimeout hooks into single structure, from Florian.
      
      10) Add fast nft_cmp expression for up to 16-bytes.
      
      11) Use cb->ctx to store context in ctnetlink dump, instead of using
          cb->args[], from Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
        netfilter: ctnetlink: use dump structure instead of raw args
        nfqueue: enable to set skb->priority
        netfilter: nft_cmp: optimize comparison for 16-bytes
        netfilter: cttimeout: use option structure
        netfilter: ecache: don't use nf_conn spinlock
        netfilter: nft_compat: suppress comment match
        netfilter: exthdr: add support for tcp option removal
        netfilter: conntrack: pptp: use single option structure
        netfilter: conntrack: remove extension register api
        netfilter: conntrack: handle ->destroy hook via nat_ops instead
        netfilter: conntrack: move extension sizes into core
        netfilter: conntrack: make all extensions 8-byte alignned
        netfilter: nfqueue: enable to get skb->priority
        netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY
      ====================
      
      Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45230829
    • Sebastian Andrzej Siewior's avatar
      tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. · 4f9bf2a2
      Sebastian Andrzej Siewior authored
      Commit
         9652dc2e ("tcp: relax listening_hash operations")
      
      removed the need to disable bottom half while acquiring
      listening_hash.lock. There are still two callers left which disable
      bottom half before the lock is acquired.
      
      On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
      as a lock to ensure that resources, that are protected by disabling
      bottom halves, remain protected.
      This leads to a circular locking dependency if the lock acquired with
      disabled bottom halves is also acquired with enabled bottom halves
      followed by disabling bottom halves. This is the reverse locking order.
      It has been observed with inet_listen_hashbucket::lock:
      
      local_bh_disable() + spin_lock(&ilb->lock):
        inet_listen()
          inet_csk_listen_start()
            sk->sk_prot->hash() := inet_hash()
      	local_bh_disable()
      	__inet_hash()
      	  spin_lock(&ilb->lock);
      	    acquire(&ilb->lock);
      
      Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
        tcp_seq_next()
          listening_get_next()
            spin_lock(&ilb2->lock);
      	acquire(&ilb2->lock);
      
        tcp4_seq_show()
          get_tcp4_sock()
            sock_i_ino()
      	read_lock_bh(&sk->sk_callback_lock);
      	  acquire(softirq_ctrl)	// <---- whoops
      	  acquire(&sk->sk_callback_lock)
      
      Drop local_bh_disable() around __inet_hash() which acquires
      listening_hash->lock. Split inet_unhash() and acquire the
      listen_hashbucket lock without disabling bottom halves; the inet_ehash
      lock with disabled bottom halves.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
      Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
      Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f9bf2a2
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 1127170d
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2022-02-09
      
      We've added 126 non-merge commits during the last 16 day(s) which contain
      a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
      
      The main changes are:
      
      1) Add custom BPF allocator for JITs that pack multiple programs into a huge
         page to reduce iTLB pressure, from Song Liu.
      
      2) Add __user tagging support in vmlinux BTF and utilize it from BPF
         verifier when generating loads, from Yonghong Song.
      
      3) Add per-socket fast path check guarding from cgroup/BPF overhead when
         used by only some sockets, from Pavel Begunkov.
      
      4) Continued libbpf deprecation work of APIs/features and removal of their
         usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
         and various others.
      
      5) Improve BPF instruction set documentation by adding byte swap
         instructions and cleaning up load/store section, from Christoph Hellwig.
      
      6) Switch BPF preload infra to light skeleton and remove libbpf dependency
         from it, from Alexei Starovoitov.
      
      7) Fix architecture-agnostic macros in libbpf for accessing syscall
         arguments from BPF progs for non-x86 architectures,
         from Ilya Leoshkevich.
      
      8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
         of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
      
      9) Add new bpf_copy_from_user_task() helper to read memory from a different
         task than current. Add ability to create sleepable BPF iterator progs,
         from Kenny Yu.
      
      10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
          utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
      
      11) Generate temporary netns names for BPF selftests to avoid naming
          collisions, from Hangbin Liu.
      
      12) Implement bpf_core_types_are_compat() with limited recursion for
          in-kernel usage, from Matteo Croce.
      
      13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
          to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
      
      14) Misc minor fixes to libbpf and selftests from various folks.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
        selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
        bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
        libbpf: Fix compilation warning due to mismatched printf format
        selftests/bpf: Test BPF_KPROBE_SYSCALL macro
        libbpf: Add BPF_KPROBE_SYSCALL macro
        libbpf: Fix accessing the first syscall argument on s390
        libbpf: Fix accessing the first syscall argument on arm64
        libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
        selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
        libbpf: Fix accessing syscall arguments on riscv
        libbpf: Fix riscv register names
        libbpf: Fix accessing syscall arguments on powerpc
        selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
        libbpf: Add PT_REGS_SYSCALL_REGS macro
        selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
        bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
        bpf: Fix leftover header->pages in sparc and powerpc code.
        libbpf: Fix signedness bug in btf_dump_array_data()
        selftests/bpf: Do not export subtest as standalone test
        bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1127170d
    • Menglong Dong's avatar
      net: drop_monitor: support drop reason · 5cad527d
      Menglong Dong authored
      In the commit c504e5c2 ("net: skb: introduce kfree_skb_reason()")
      drop reason is introduced to the tracepoint of kfree_skb. Therefore,
      drop_monitor is able to report the drop reason to users by netlink.
      
      The drop reasons are reported as string to users, which is exactly
      the same as what we do when reporting it to ftrace.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cad527d
  2. 09 Feb, 2022 28 commits
    • Alexei Starovoitov's avatar
      Merge branch 'Split bpf_sk_lookup remote_port field' · e5313968
      Alexei Starovoitov authored
      Jakub Sitnicki says:
      
      ====================
      
      Following the recent split-up of the bpf_sock dst_port field, apply the same to
      technique to the bpf_sk_lookup remote_port field to make uAPI more user
      friendly.
      
      v1 -> v2:
      - Remove remote_port range check and cast to be16 in TEST_RUN for sk_lookup
        (kernel test robot)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e5313968
    • Jakub Sitnicki's avatar
      selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup · 2ed0dc59
      Jakub Sitnicki authored
      Extend the context access tests for sk_lookup prog to cover the surprising
      case of a 4-byte load from the remote_port field, where the expected value
      is actually shifted by 16 bits.
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
      2ed0dc59
    • Jakub Sitnicki's avatar
      bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide · 9a69e2b3
      Jakub Sitnicki authored
      remote_port is another case of a BPF context field documented as a 32-bit
      value in network byte order for which the BPF context access converter
      generates a load of a zero-padded 16-bit integer in network byte order.
      
      First such case was dst_port in bpf_sock which got addressed in commit
      4421a582 ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").
      
      Loading 4-bytes from the remote_port offset and converting the value with
      bpf_ntohl() leads to surprising results, as the expected value is shifted
      by 16 bits.
      
      Reduce the confusion by splitting the field in two - a 16-bit field holding
      a big-endian integer, and a 16-bit zero-padding anonymous field that
      follows it.
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
      9a69e2b3
    • Brett Creeley's avatar
      ice: Add ability for PF admin to enable VF VLAN pruning · f1da5a08
      Brett Creeley authored
      VFs by default are able to see all tagged traffic regardless of trust
      and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
      expect VFs to receive all VLAN tagged traffic with a matching
      destination MAC.
      
      Add an ethtool private flag 'vf-vlan-pruning' and set the default to
      off so VFs will receive all VLAN traffic directed towards them. When
      the flag is turned on, VF will only be able to receive untagged
      traffic or traffic with VLAN tags it has created interfaces for.
      
      Also, the flag cannot be changed while any VFs are allocated. This was
      done to simplify the implementation. So, if this flag is needed, then
      the PF admin must enable it. If the user tries to enable the flag while
      VFs are active, then print an unsupported message with the
      vf-vlan-pruning flag included. In case multiple flags were specified, this
      makes it clear to the user which flag failed.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f1da5a08
    • Brett Creeley's avatar
      ice: Add support for 802.1ad port VLANs VF · cbc8b564
      Brett Creeley authored
      Currently there is only support for 802.1Q port VLANs on SR-IOV VFs. Add
      support to also allow 802.1ad port VLANs when double VLAN mode is
      enabled.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      cbc8b564
    • Brett Creeley's avatar
      ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev · 1babaf77
      Brett Creeley authored
      In order for the driver to support 802.1ad VLAN filtering and offloads,
      it needs to advertise those VLAN features and also support modifying
      those VLAN features, so make the necessary changes to
      ice_set_netdev_features(). By default, enable CTAG insertion/stripping
      and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM).
      Also, in DVM, enable STAG filtering by default. This is done by
      setting the feature bits in netdev->features. Also, in DVM, support
      toggling of STAG insertion/stripping, but don't enable them by
      default. This is done by setting the feature bits in
      netdev->hw_features.
      
      Since 802.1ad VLAN filtering and offloads are only supported in DVM, make
      sure they are not enabled by default and that they cannot be enabled
      during runtime, when the device is in SVM.
      
      Add an implementation for the ndo_fix_features() callback. This is
      needed since the hardware cannot support multiple VLAN ethertypes for
      VLAN insertion/stripping simultaneously and all supported VLAN filtering
      must either be enabled or disabled together.
      
      Disable inner VLAN stripping by default when DVM is enabled. If a VSI
      supports stripping the inner VLAN in DVM, then it will have to configure
      that during runtime. For example if a VF is configured in a port VLAN
      while DVM is enabled it will be allowed to offload inner VLANs.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1babaf77
    • Brett Creeley's avatar
      ice: Support configuring the device to Double VLAN Mode · a1ffafb0
      Brett Creeley authored
      In order to support configuring the device in Double VLAN Mode (DVM),
      the DDP and FW have to support DVM. If both support DVM, the PF that
      downloads the package needs to update the default recipes, set the
      VLAN mode, and update boost TCAM entries.
      
      To support updating the default recipes in DVM, add support for
      updating an existing switch recipe's lkup_idx and mask. This is done
      by first calling the get recipe AQ (0x0292) with the desired recipe
      ID. Then, if that is successful update one of the lookup indices
      (lkup_idx) and its associated mask if the mask is valid otherwise
      the already existing mask will be used.
      
      The VLAN mode of the device has to be configured while the global
      configuration lock is held while downloading the DDP, specifically after
      the DDP has been downloaded. If supported, the device will default to
      DVM.
      Co-developed-by: default avatarDan Nowlin <dan.nowlin@intel.com>
      Signed-off-by: default avatarDan Nowlin <dan.nowlin@intel.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a1ffafb0
    • Brett Creeley's avatar
      ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 · cc71de8f
      Brett Creeley authored
      Add support for the VF driver to be able to request
      VIRTCHNL_VF_OFFLOAD_VLAN_V2, negotiate its VLAN capabilities via
      VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, add/delete VLAN filters, and
      enable/disable VLAN offloads.
      
      VFs supporting VIRTCHNL_OFFLOAD_VLAN_V2 will be able to use the
      following virtchnl opcodes:
      
      VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
      VIRTCHNL_OP_ADD_VLAN_V2
      VIRTCHNL_OP_DEL_VLAN_V2
      VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2
      VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2
      VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2
      VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2
      
      Legacy VF drivers may expect the initial VLAN stripping settings to be
      configured by the PF, so the PF initializes VLAN stripping based on the
      VIRTCHNL_OP_GET_VF_RESOURCES opcode. However, with VLAN support via
      VIRTCHNL_VF_OFFLOAD_VLAN_V2, this function is only expected to be used
      for VFs that only support VIRTCHNL_VF_OFFLOAD_VLAN, which will only
      be supported when a port VLAN is configured. Update the function
      based on the new expectations. Also, change the message when the PF
      can't enable/disable VLAN stripping to a dev_dbg() as this isn't fatal.
      
      When a VF isn't in a port VLAN and it only supports
      VIRTCHNL_VF_OFFLOAD_VLAN when Double VLAN Mode (DVM) is enabled, then
      the PF needs to reject the VIRTCHNL_VF_OFFLOAD_VLAN capability and
      configure the VF in software only VLAN mode. To do this add the new
      function ice_vf_vsi_cfg_legacy_vlan_mode(), which updates the VF's
      inner and outer ice_vsi_vlan_ops functions and sets up software only
      VLAN mode.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      cc71de8f
    • Brett Creeley's avatar
      ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads · 0d54d8f7
      Brett Creeley authored
      Currently the driver only supports 802.1Q VLAN insertion and stripping.
      However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q
      and 802.1ad VLAN insertion and stripping will be supported. Unfortunately
      the VSI context parameters only allow for one VLAN ethertype at a time
      for VLAN offloads so only one or the other VLAN ethertype offload can be
      supported at once.
      
      To support this, multiple changes are needed.
      
      Rx path changes:
      
      [1] In DVM, the Rx queue context l2tagsel field needs to be cleared so
      the outermost tag shows up in the l2tag2_2nd field of the Rx flex
      descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain
      1 to support SVM configurations.
      
      [2] Modify the ice_test_staterr() function to take a __le16 instead of
      the ice_32b_rx_flex_desc union pointer so this function can be used for
      both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.
      
      [3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that
      checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.
      
      [4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX
      is enabled in netdev->features. If it is, then this is the VLAN
      ethertype that needs to be added to the stripping VLAN tag. Since
      ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled
      simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.
      
      Tx path changes:
      
      [1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx
      context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was
      added to the list of tx_flags to handle this case.
      
      [2] When the stack requests the VLAN tag to be offloaded on Tx, the
      driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or
      ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1
      respectively. To determine which location to use, set a bit in the Tx
      ring flags field during ring allocation that can be used to determine
      which field to use in the Tx descriptor. In DVM, always use l2tag2,
      and in SVM, always use l2tag1.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0d54d8f7
    • Brett Creeley's avatar
      ice: Add outer_vlan_ops and VSI specific VLAN ops implementations · c31af68a
      Brett Creeley authored
      Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
      ops are only available when the device is in Double VLAN Mode (DVM).
      Depending on the VSI type, the requirements for what operations to
      use/allow differ.
      
      By default all VSI's have unsupported inner and outer VSI VLAN ops. This
      implementation was chosen to prevent unexpected crashes due to null
      pointer dereferences. Instead, if a VSI calls an unsupported op, it will
      just return -EOPNOTSUPP.
      
      Add implementations to support modifying outer VLAN fields for VSI
      context. This includes the ability to modify VLAN stripping, insertion,
      and the port VLAN based on the outer VLAN handling fields of the VSI
      context.
      
      These functions should only ever be used if DVM is enabled because that
      means the firmware supports the outer VLAN fields in the VSI context. If
      the device is in DVM, then always use the outer_vlan_ops, else use the
      vlan_ops since the device is in Single VLAN Mode (SVM).
      
      Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
      ice_vsi_vlan_setup() as the latter function is specific to the PF and
      all other VSI types that need an untagged VLAN 0 filter already do this
      in their specific flows. Without this change, Flow Director is failing
      to initialize because it does not implement any VSI VLAN ops.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c31af68a
    • Brett Creeley's avatar
      ice: Adjust naming for inner VLAN operations · 7bd527aa
      Brett Creeley authored
      Current operations act on inner VLAN fields. To support double VLAN, outer
      VLAN operations and functions will be implemented. Add the "inner" naming
      to existing VLAN operations to distinguish them from the upcoming outer
      values and functions. Some spacing adjustments are made to align
      values.
      
      Note that the inner is not talking about a tunneled VLAN, but the second
      VLAN in the packet. For SVM the driver uses inner or single VLAN
      filtering and offloads and in Double VLAN Mode the driver uses the
      inner filtering and offloads for SR-IOV VFs in port VLANs in order to
      support offloading the guest VLAN while a port VLAN is configured.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7bd527aa
    • Brett Creeley's avatar
      ice: Use the proto argument for VLAN ops · 2bfefa2d
      Brett Creeley authored
      Currently the proto argument is unused. This is because the driver only
      supports 802.1Q VLAN filtering. This policy is enforced via netdev
      features that the driver sets up when configuring the netdev, so the
      proto argument won't ever be anything other than 802.1Q. However, this
      will allow for future iterations of the driver to seemlessly support
      802.1ad filtering. Begin using the proto argument and extend the related
      structures to support its use.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2bfefa2d
    • Brett Creeley's avatar
      ice: Refactor vf->port_vlan_info to use ice_vlan · a19d7f7f
      Brett Creeley authored
      The current vf->port_vlan_info variable is a packed u16 that contains
      the port VLAN ID and QoS/prio value. This is fine, but changes are
      incoming that allow for an 802.1ad port VLAN. Add flexibility by
      changing the vf->port_vlan_info member to be an ice_vlan structure.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a19d7f7f
    • Brett Creeley's avatar
      ice: Introduce ice_vlan struct · fb05ba12
      Brett Creeley authored
      Add a new struct for VLAN related information. Currently this holds
      VLAN ID and priority values, but will be expanded to hold TPID value.
      This reduces the changes necessary if any other values are added in
      future. Remove the action argument from these calls as it's always
      ICE_FWD_VSI.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fb05ba12
    • Brett Creeley's avatar
      ice: Add new VSI VLAN ops · bc42afa9
      Brett Creeley authored
      Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
      offloads require more flexibility when configuring VLANs. The VSI VLAN
      interface will allow flexibility for configuring VLANs for all VSI
      types. Add new files to separate the VSI VLAN ops and move functions to
      make the code more organized.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bc42afa9
    • Brett Creeley's avatar
      ice: Add helper function for adding VLAN 0 · 3e0b5971
      Brett Creeley authored
      There are multiple places where VLAN 0 is being added. Create a function
      to be called in order to minimize changes as the implementation is expanded
      to support double VLAN and avoid duplicated code.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3e0b5971
    • Brett Creeley's avatar
      ice: Refactor spoofcheck configuration functions · daf4dd16
      Brett Creeley authored
      Add functions to configure Tx VLAN antispoof based on iproute
      configuration and/or VLAN mode and VF driver support. This is needed
      later so the driver can control when it can be configured. Also, add
      functions that can be used to enable and disable MAC and VLAN
      spoofcheck. Move spoofchk configuration during VSI setup into the
      SR-IOV initialization path and into the post VSI rebuild flow for VF
      VSIs.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      daf4dd16
    • Andrii Nakryiko's avatar
      libbpf: Fix compilation warning due to mismatched printf format · dc37dc61
      Andrii Nakryiko authored
      On ppc64le architecture __s64 is long int and requires %ld. Cast to
      ssize_t and use %zd to avoid architecture-specific specifiers.
      
      Fixes: 4172843e ("libbpf: Fix signedness bug in btf_dump_array_data()")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
      dc37dc61
    • Oleksij Rempel's avatar
      net: usb: smsc95xx: add generic selftest support · 1710b52d
      Oleksij Rempel authored
      Provide generic selftest support. Tested with LAN9500 and LAN9512.
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1710b52d
    • Wang Qing's avatar
      net: ethernet: cavium: use div64_u64() instead of do_div() · 038fcdaf
      Wang Qing authored
      do_div() does a 64-by-32 division.
      When the divisor is u64, do_div() truncates it to 32 bits, this means it
      can test non-zero and be truncated to zero for division.
      
      fix do_div.cocci warning:
      do_div() does a 64-by-32 division, please consider using div64_u64 instead.
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      038fcdaf
    • Po Liu's avatar
      net:enetc: enetc qos using the CBDR dma alloc function · 237d20c2
      Po Liu authored
      Now we can use the enetc_cbd_alloc_data_mem() to replace complicated DMA
      data alloc method and CBDR memory basic seting.
      Signed-off-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      237d20c2
    • Po Liu's avatar
      net:enetc: command BD ring data memory alloc as one function alone · 0cc11cdb
      Po Liu authored
      Separate the CBDR data memory alloc standalone. It is convenient for
      other part loading, for example the ENETC QOS part.
      Reported-and-suggested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cc11cdb
    • Po Liu's avatar
      net:enetc: allocate CBD ring data memory using DMA coherent methods · b3a723db
      Po Liu authored
      To replace the dma_map_single() stream DMA mapping with DMA coherent
      method dma_alloc_coherent() which is more simple.
      
      dma_map_single() found by Tim Gardner not proper. Suggested by Claudiu
      Manoil and Jakub Kicinski to use dma_alloc_coherent(). Discussion at:
      
      https://lore.kernel.org/netdev/AM9PR04MB8397F300DECD3C44D2EBD07796BD9@AM9PR04MB8397.eurprd04.prod.outlook.com/t/
      
      Fixes: 888ae5a3 ("net: enetc: add tc flower psfp offload driver")
      cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Reported-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarPo Liu <po.liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3a723db
    • David S. Miller's avatar
      Merge branch 'dpaa2-eth-sw-TSO' · 62b5b162
      David S. Miller authored
      Ioana Ciornei says:
      
      ====================
      dpaa2-eth: add support for software TSO
      
      This series adds support for driver level TSO in the dpaa2-eth driver.
      
      The first 5 patches lay the ground work for the actual feature:
      rearrange some variable declaration, cleaning up the interraction with
      the S/G Table buffer cache etc.
      
      The 6th patch adds the actual driver level software TSO support by using
      the usual tso_build_hdr()/tso_build_data() APIs and creates the S/G FDs.
      
      With this patch set we can see the following improvement in a TCP flow
      running on a single A72@2.2GHz of the LX2160A SoC:
      
      before: 6.38Gbit/s
      after:  8.48Gbit/s
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62b5b162
    • Ioana Ciornei's avatar
      soc: fsl: dpio: read the consumer index from the cache inhibited area · 86ec882f
      Ioana Ciornei authored
      Once we added support in the dpaa2-eth for driver level software TSO we
      observed the following situation: if the EQCR CI (consumer index) is
      read from the cache-enabled area we sometimes end up with a computed
      value of available enqueue entries bigger than the size of the ring.
      
      This eventually will lead to the multiple enqueue of the same FD which
      will determine the same FD to end up on the Tx confirmation path and the
      same skb being freed twice.
      
      Just read the consumer index from the cache inhibited area so that we
      avoid this situation.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86ec882f
    • Ioana Ciornei's avatar
      dpaa2-eth: add support for software TSO · 3dc709e0
      Ioana Ciornei authored
      This patch adds support for driver level TSO in the enetc driver using
      the TSO API.
      
      There is not much to say about this specific implementation. We are
      using the usual tso_build_hdr(), tso_build_data() to create each data
      segment, we create an array of S/G FDs where the first S/G entry is
      referencing the header data and the remaining ones the data portion.
      
      For the S/G Table buffer we use the same cache of buffers used on the
      other non-GSO cases - dpaa2_eth_sgt_get() and dpaa2_eth_sgt_recycle().
      
      We cannot keep a DMA coherent buffer for all the TSO headers because the
      DPAA2 architecture does not work in a ring based fashion so we just
      allocate a buffer each time.
      
      Even with these limitations we get the following improvement in TCP
      termination on the LX2160A SoC, on a single A72 core running at 2.2GHz.
      
      before: 6.38Gbit/s
      after:  8.48Gbit/s
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dc709e0
    • Ioana Ciornei's avatar
      dpaa2-eth: work with an array of FDs · a4ca448e
      Ioana Ciornei authored
      Up until now, the __dpaa2_eth_tx function used a single FD on the stack
      to construct the structure to be enqueued. Since we are now preparing
      the ground work to add support for TSO done in software at the driver
      level, the same function needs to work with an array of FDs and enqueue
      as many as the build_*_fd functions create.
      
      Make the necessary adjustments in order to do this. These include:
      keeping an array of FDs in a percpu structure, cleaning up the necessary
      FDs before populating it and then, retrying the enqueue process up till
      all the generated FDs were enqueued or until we reach the maximum number
      retries.
      
      This patch does not change the fact that only a single FD will result
      from a __dpaa2_eth_tx call but rather just creates the necessary changes
      for the next patch.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4ca448e
    • Ioana Ciornei's avatar
      dpaa2-eth: use the S/G table cache also for the normal S/G path · a4218aef
      Ioana Ciornei authored
      Instead of allocating memory for an S/G table each time a nonlinear skb
      is processed, and then freeing it on the Tx confirmation path, use the
      S/G table cache in order to reuse the memory.
      
      For this to work we have to change the size of the cached buffers so
      that it can hold the maximum number of scatterlist entries.
      
      Other than that, each allocate/free call is replaced by a call to the
      dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle functions, introduced in the
      previous patch.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4218aef