1. 15 May, 2020 30 commits
    • David S. Miller's avatar
      Merge branch 'Implement-classifier-action-terse-dump-mode' · cd2809cc
      David S. Miller authored
      Vlad Buslov says:
      
      ====================
      Implement classifier-action terse dump mode
      
      Output rate of current upstream kernel TC filter dump implementation if
      relatively low (~100k rules/sec depending on configuration). This
      constraint impacts performance of software switch implementation that
      rely on TC for their datapath implementation and periodically call TC
      filter dump to update rules stats. Moreover, TC filter dump output a lot
      of static data that don't change during the filter lifecycle (filter
      key, specific action details, etc.) which constitutes significant
      portion of payload on resulting netlink packets and increases amount of
      syscalls necessary to dump all filters on particular Qdisc. In order to
      significantly improve filter dump rate this patch sets implement new
      mode of TC filter dump operation named "terse dump" mode. In this mode
      only parameters necessary to identify the filter (handle, action cookie,
      etc.) and data that can change during filter lifecycle (filter flags,
      action stats, etc.) are preserved in dump output while everything else
      is omitted.
      
      Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
      available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
      individual classifier support (new tcf_proto_ops->terse_dump()
      callback). Support for action terse dump is implemented in act API and
      don't require changing individual action implementations.
      
      The following table provides performance comparison between regular
      filter dump and new terse dump mode for two classifier-action profiles:
      one minimal config with L2 flower classifier and single gact action and
      another heavier config with L2+5tuple flower classifier with
      tunnel_key+mirred actions.
      
       Classifier-action type      |        dump |  terse dump | X improvement
                                   | (rules/sec) | (rules/sec) |
      -----------------------------+-------------+-------------+---------------
       L2 with gact                |       141.8 |       293.2 |          2.07
       L2+5tuple tunnel_key+mirred |        76.4 |       198.8 |          2.60
      
      Benchmark details: to measure the rate tc filter dump and terse dump
      commands are invoked on ingress Qdisc that have one million filters
      configured using following commands.
      
      > time sudo tc -s filter show dev ens1f0 ingress >/dev/null
      
      > time sudo tc -s filter show terse dev ens1f0 ingress >/dev/null
      
      Value in results table is calculated by dividing 1000000 total rules by
      "real" time reported by time command.
      
      Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory
      ====================
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd2809cc
    • Vlad Buslov's avatar
      selftests: implement flower classifier terse dump tests · e7534fd4
      Vlad Buslov authored
      Implement two basic tests to verify terse dump functionality of flower
      classifier:
      
      - Test that verifies that terse dump works.
      
      - Test that verifies that terse dump doesn't print filter key.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7534fd4
    • Vlad Buslov's avatar
      net: sched: cls_flower: implement terse dump support · 0348451d
      Vlad Buslov authored
      Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only
      dump handle, flags and action data in terse mode.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0348451d
    • Vlad Buslov's avatar
      net: sched: implement terse dump support in act · ca44b738
      Vlad Buslov authored
      Extend tcf_action_dump() with boolean argument 'terse' that is used to
      request terse-mode action dump. In terse mode only essential data needed to
      identify particular action (action kind, cookie, etc.) and its stats is put
      to resulting skb and everything else is omitted. Implement
      tcf_exts_terse_dump() helper in cls API that is intended to be used to
      request terse dump of all exts (actions) attached to the filter.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca44b738
    • Vlad Buslov's avatar
      net: sched: introduce terse dump flag · f8ab1807
      Vlad Buslov authored
      Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
      filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
      is intended to be used to improve performance of TC filter dump when
      userland only needs to obtain stats and not the whole classifier/action
      data. Extend struct tcf_proto_ops with new terse_dump() callback that must
      be defined by supporting classifier implementations.
      
      Support of the options in specific classifiers and actions is
      implemented in following patches in the series.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8ab1807
    • Tobias Waldekranz's avatar
      net: core: recursively find netdev by device node · 2e186a2c
      Tobias Waldekranz authored
      The assumption that a device node is associated either with the
      netdev's device, or the parent of that device, does not hold for all
      drivers. E.g. Freescale's DPAA has two layers of platform devices
      above the netdev. Instead, recursively walk up the tree from the
      netdev, allowing any parent to match against the sought after node.
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e186a2c
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d00f26b6
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2020-05-14
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.
      
      2) support for narrow loads in bpf_sock_addr progs and additional
         helpers in cg-skb progs, from Andrey.
      
      3) bpf benchmark runner, from Andrii.
      
      4) arm and riscv JIT optimizations, from Luke.
      
      5) bpf iterator infrastructure, from Yonghong.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00f26b6
    • Alexei Starovoitov's avatar
      Merge branch 'expand-cg_skb-helpers' · b92d44b5
      Alexei Starovoitov authored
      Andrey Ignatov says:
      
      ====================
      v2->v3:
      - better documentation for bpf_sk_cgroup_id in uapi (Yonghong Song)
      - save/restore errno in network helpers (Yonghong Song)
      - cleanup leftover after switching selftest to skeleton (Yonghong Song)
      - switch from map to skel->bss in selftest (Yonghong Song)
      
      v1->v2:
      - switch selftests to skeleton.
      
      This patch set allows a bunch of existing sk lookup and skb cgroup id
      helpers, and adds two new bpf_sk_{,ancestor_}cgroup_id helpers to be used
      in cgroup skb programs.
      
      It fills the gap to cover a use-case to apply intra-host cgroup-bpf network
      policy based on a source cgroup a packet comes from.
      
      For example, there can be multiple containers A, B, C running on a host.
      Every such container runs in its own cgroup that can have multiple
      sub-cgroups. But all these containers can share some IP addresses.
      
      At the same time container A wants to have a policy for a server S running
      in it so that only clients from this same container can connect to S, but
      not from other containers (such as B, C). Source IP address can't be used
      to decide whether to allow or deny a packet, but it looks reasonable to
      filter by cgroup id.
      
      The patch set allows to implement the following policy:
      * when an ingress packet comes to container's cgroup, lookup peer (client)
        socket this packet comes from;
      * having peer socket, get its cgroup id;
      * compare peer cgroup id with self cgroup id and allow packet only if they
        match, i.e. it comes from same cgroup;
      * the "sub-cgroup" part of the story can be addressed by getting not direct
        cgroup id of the peer socket, but ancestor cgroup id on specified level,
        similar to existing "ancestor" flavors of cgroup id helpers.
      
      A newly introduced selftest implements such a policy in its basic form to
      provide a better idea on the use-case.
      
      Patch 1 allows existing sk lookup helpers in cgroup skb.
      Patch 2 allows skb_ancestor_cgroup_id in cgrou skb.
      Patch 3 introduces two new helpers to get cgroup id of socket.
      Patch 4 extends network helpers to use them in the next patch.
      Patch 5 adds selftest / example of use-case.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b92d44b5
    • Andrey Ignatov's avatar
      selftests/bpf: Test for sk helpers in cgroup skb · 68e916bc
      Andrey Ignatov authored
      Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and
      bpf_sk_ancestor_cgroup_id helpers from cgroup skb program.
      
      The test creates a testing cgroup, starts a TCPv6 server inside the
      cgroup and creates two client sockets: one inside testing cgroup and one
      outside.
      
      Then it attaches cgroup skb program to the cgroup that checks all TCP
      segments coming to the server and allows only those coming from the
      cgroup of the server. If a segment comes from a peer outside of the
      cgroup, it'll be dropped.
      
      Finally the test checks that client from inside testing cgroup can
      successfully connect to the server, but client outside the cgroup fails
      to connect by timeout.
      
      The main goal of the test is to check newly introduced
      bpf_sk_{,ancestor_}cgroup_id helpers.
      
      It also checks a couple of socket lookup helpers (tcp & release), but
      lookup helpers were introduced much earlier and covered by other tests.
      Here it's mostly checked that they can be called from cgroup skb.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
      68e916bc
    • Andrey Ignatov's avatar
      selftests/bpf: Add connect_fd_to_fd, connect_wait net helpers · 383724e1
      Andrey Ignatov authored
      Add two new network helpers.
      
      connect_fd_to_fd connects an already created client socket fd to address
      of server fd. Sometimes it's useful to separate client socket creation
      and connecting this socket to a server, e.g. if client socket has to be
      created in a cgroup different from that of server cgroup.
      
      Additionally connect_to_fd is now implemented using connect_fd_to_fd,
      both helpers don't treat EINPROGRESS as an error and let caller decide
      how to proceed with it.
      
      connect_wait is a helper to work with non-blocking client sockets so
      that if connect_to_fd or connect_fd_to_fd returned -1 with errno ==
      EINPROGRESS, caller can wait for connect to finish or for connection
      timeout. The helper returns -1 on error, 0 on timeout (1sec,
      hard-coded), and positive number on success.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
      383724e1
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers · f307fa2c
      Andrey Ignatov authored
      With having ability to lookup sockets in cgroup skb programs it becomes
      useful to access cgroup id of retrieved sockets so that policies can be
      implemented based on origin cgroup of such socket.
      
      For example, a container running in a cgroup can have cgroup skb ingress
      program that can lookup peer socket that is sending packets to a process
      inside the container and decide whether those packets should be allowed
      or denied based on cgroup id of the peer.
      
      More specifically such ingress program can implement intra-host policy
      "allow incoming packets only from this same container and not from any
      other container on same host" w/o relying on source IP addresses since
      quite often it can be the case that containers share same IP address on
      the host.
      
      Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
      bpf_sk_ancestor_cgroup_id().
      
      These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
      helpers with the only difference that sk is used to get cgroup id
      instead of skb, and share code with them.
      
      See documentation in UAPI for more details.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
      f307fa2c
    • Andrey Ignatov's avatar
      bpf: Allow skb_ancestor_cgroup_id helper in cgroup skb · 06d3e4c9
      Andrey Ignatov authored
      cgroup skb programs already can use bpf_skb_cgroup_id. Allow
      bpf_skb_ancestor_cgroup_id as well so that container policies can be
      implemented for a container that can have sub-cgroups dynamically
      created, but policies should still be implemented based on cgroup id of
      container itself not on an id of a sub-cgroup.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/8874194d6041eba190356453ea9f6071edf5f658.1589486450.git.rdna@fb.com
      06d3e4c9
    • Andrey Ignatov's avatar
      bpf: Allow sk lookup helpers in cgroup skb · d56c2f95
      Andrey Ignatov authored
      Currently sk lookup helpers are allowed in tc, xdp, sk skb, and cgroup
      sock_addr programs.
      
      But they would be useful in cgroup skb as well so that for example
      cgroup skb ingress program can lookup a peer socket a packet comes from
      on same host and make a decision whether to allow or deny this packet
      based on the properties of that socket, e.g. cgroup that peer socket
      belongs to.
      
      Allow the following sk lookup helpers in cgroup skb:
      * bpf_sk_lookup_tcp;
      * bpf_sk_lookup_udp;
      * bpf_sk_release;
      * bpf_skc_lookup_tcp.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/f8c7ee280f1582b586629436d777b6db00597d63.1589486450.git.rdna@fb.com
      d56c2f95
    • Colin Ian King's avatar
      5b0004d9
    • Andrii Nakryiko's avatar
      bpf: Fix bpf_iter's task iterator logic · c70f34a8
      Andrii Nakryiko authored
      task_seq_get_next might stop prematurely if get_pid_task() fails to get
      task_struct. Failure to do so doesn't mean that there are no more tasks with
      higher pids. Procfs's iteration algorithm (see next_tgid in fs/proc/base.c)
      does a retry in such case. After this fix, instead of stopping prematurely
      after about 300 tasks on my server, bpf_iter program now returns >4000, which
      sounds much closer to reality.
      
      Fixes: eaaacd23 ("bpf: Add task and task/file iterator targets")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200514055137.1564581-1-andriin@fb.com
      c70f34a8
    • Andrey Ignatov's avatar
      selftests/bpf: Test narrow loads for bpf_sock_addr.user_port · 0645f7eb
      Andrey Ignatov authored
      Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr
      programs.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
      0645f7eb
    • Andrey Ignatov's avatar
      bpf: Support narrow loads from bpf_sock_addr.user_port · 7aebfa1b
      Andrey Ignatov authored
      bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
      code in BPF programs, like:
      
      	volatile __u32 user_port = ctx->user_port;
      	__u16 port = bpf_ntohs(user_port);
      
      Since otherwise clang may optimize the load to be 2-byte and it's
      rejected by verifier.
      
      Add support for 1- and 2-byte loads same way as it's supported for other
      fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
      7aebfa1b
    • Lorenzo Bianconi's avatar
      samples/bpf: xdp_redirect_cpu: Set MAX_CPUS according to NR_CPUS · 6a098154
      Lorenzo Bianconi authored
      xdp_redirect_cpu is currently failing in bpf_prog_load_xattr()
      allocating cpu_map map if CONFIG_NR_CPUS is less than 64 since
      cpu_map_alloc() requires max_entries to be less than NR_CPUS.
      Set cpu_map max_entries according to NR_CPUS in xdp_redirect_cpu_kern.c
      and get currently running cpus in xdp_redirect_cpu_user.c
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/374472755001c260158c4e4b22f193bdd3c56fb7.1589300442.git.lorenzo@kernel.org
      6a098154
    • Heiner Kallweit's avatar
      r8169: don't include linux/moduleparam.h · 9b65d2ff
      Heiner Kallweit authored
      93882c6f ("r8169: switch from netif_xxx message functions to
      netdev_xxx") removed the last module parameter from the driver,
      therefore there's no need any longer to include linux/moduleparam.h.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b65d2ff
    • Heiner Kallweit's avatar
      r8169: remove not needed checks in rtl8169_set_eee · aa443b3f
      Heiner Kallweit authored
      After 9de5d235 ("net: phy: fix aneg restart in phy_ethtool_set_eee")
      we don't need the check for aneg being enabled any longer, and as
      discussed with Russell configuring the EEE advertisement should be
      supported even if we're in a half-duplex mode currently.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa443b3f
    • Colin Ian King's avatar
      net: dsa: felix: fix incorrect clamp calculation for burst · b014d043
      Colin Ian King authored
      Currently burst is clamping on rate and not burst, the assignment
      of burst from the clamping discards the previous assignment of burst.
      This looks like a cut-n-paste error from the previous clamping
      calculation on ramp.  Fix this by replacing ramp with burst.
      
      Addresses-Coverity: ("Unused value")
      Fixes: 0fbabf87 ("net: dsa: felix: add support Credit Based Shaper(CBS) for hardware offload")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b014d043
    • Bartosz Golaszewski's avatar
      net: phy: mdio-moxart: remove unneeded include · 140ad6c8
      Bartosz Golaszewski authored
      mdio-moxart doesn't use regulators in the driver code. We can remove
      the regulator include.
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      140ad6c8
    • Dan Murphy's avatar
      dt-bindings: dp83867: Convert DP83867 to yaml · 74ac28f1
      Dan Murphy authored
      Convert the dp83867 binding to yaml.
      Signed-off-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74ac28f1
    • Dan Murphy's avatar
      dt-bindings: net: dp83869: Update licensing info · e90b651e
      Dan Murphy authored
      Add BSD 2 Clause to the licensing.
      
      CC: Rob Herring <robh@kernel.org>
      Signed-off-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e90b651e
    • Luo bin's avatar
      hinic: update huawei ethernet driver maintainer · 3f044d26
      Luo bin authored
      update huawei ethernet driver maintainer from aviad to Bin luo
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f044d26
    • Luo bin's avatar
      hinic: add set_ringparam ethtool_ops support · bcab6782
      Luo bin authored
      support to change TX/RX queue depth with ethtool -G
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcab6782
    • Jakub Kicinski's avatar
      devlink: refactor end checks in devlink_nl_cmd_region_read_dumpit · 5a46b062
      Jakub Kicinski authored
      Clean up after recent fixes, move address calculations
      around and change the variable init, so that we can have
      just one start_offset == end_offset check.
      
      Make the check a little stricter to preserve the -EINVAL
      error if requested start offset is larger than the region
      itself.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a46b062
    • David S. Miller's avatar
      Merge branch 'am65-cpsw-add-taprio-EST-offload-support' · c7ad3657
      David S. Miller authored
      Murali Karicheri says:
      
      ====================
      am65-cpsw: add taprio/EST offload support
      
      AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined
      in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
      configuration. EST allows express queue traffic to be scheduled
      (placed) on the wire at specific repeatable time intervals. In
      Linux kernel, EST configuration is done through tc command and
      the taprio scheduler in the net core implements a software only
      scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
      user indicate "flag 2" in the command which is then parsed by
      taprio scheduler in net core and indicate that the command is to
      be offloaded to h/w. taprio then offloads the command to the
      driver by calling ndo_setup_tc() ndo ops. This patch implements
      ndo_setup_tc() as well as other changes required to offload EST
      configuration to CPSW h/w
      
      For more details please refer patch 2/2.
      
      This series is based on original work done by Ivan Khoronzhuk
      <ivan.khoronzhuk@linaro.org> to add taprio offload support to
      AM65 CPSW 2G.
      
      1. Example configuration 3 Gates
      
      ifconfig eth0 down
      ethtool -L eth0 tx 3
      
      ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off
      
      ifconfig eth0 192.168.2.20
      
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          num_tc 3 \
          map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \
          queues 1@0 1@1 1@2 \
          base-time 0000 \
          sched-entry S 4 125000 \
          sched-entry S 2 125000 \
          sched-entry S 1 250000 \
          flags 2
      
      2. Example configuration 8 Gates
      
      ifconfig eth0 down
      ethtool -L eth0 tx 8
      
      ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off
      
      ifconfig eth0 192.168.2.20
      
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          num_tc 8 \
          map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \
          queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
          base-time 0000 \
          sched-entry S 80 125000 \
          sched-entry S 40 125000 \
          sched-entry S 20 125000 \
          sched-entry S 10 125000 \
          sched-entry S 08 125000 \
          sched-entry S 04 125000 \
          sched-entry S 02 125000 \
          sched-entry S 01 125000 \
          flags 2
      
      Classify frames to particular priority using skbedit so that they land at
      a specific queue in cpsw h/w which is Gated by the EST gate which opens based
      on the sched-entry.
      
      tc qdisc add dev eth0 clsact
      
      In the below for example an iperf3 session with destination port 5007
      will go through Q7.
      
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5007 0xffff action skbedit priority 7
      
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5006 0xffff action skbedit priority 6
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5005 0xffff action skbedit priority 5
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5004 0xffff action skbedit priority 4
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5003 0xffff action skbedit priority 3
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5002 0xffff action skbedit priority 2
      tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5001 0xffff action skbedit priority 1
      
      iperf3 -c 192.168.2.10 -u -l1470 -b32M -t1 -p 5007
      
      Testing was done by capturing frames at the PC using wireshark and checking for
      the bust interval or cycle time of UDP frames with a specific port number.
      Verified that the distance between first frame of a burst (cycle-time) is 1
      milli second and burst duration is within 125 usec based on the received packet
      timestamp shown in wireshark packet display.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ad3657
    • Ivan Khoronzhuk's avatar
      ethernet: ti: am65-cpsw-qos: add TAPRIO offload support · 8127224c
      Ivan Khoronzhuk authored
      AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined
      in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
      configuration. EST allows express queue traffic to be scheduled
      (placed) on the wire at specific repeatable time intervals. In
      Linux kernel, EST configuration is done through tc command and
      the taprio scheduler in the net core implements a software only
      scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
      user indicate "flag 2" in the command which is then parsed by
      taprio scheduler in net core and indicate that the command is to
      be offloaded to h/w. taprio then offloads the command to the
      driver by calling ndo_setup_tc() ndo ops. This patch implements
      ndo_setup_tc() to offload EST configuration to CPSW h/w.
      
      Currently driver supports only SetGateStates operation. EST
      operates on a repeating time interval generated by the CPTS EST
      function generator. Each Ethernet port has a global EST fetch
      RAM that can be configured as 2 buffers, each of 64 locations
      or one large buffer of 128 locations. In 2 buffer configuration,
      a ping pong mechanism is used to hold the active schedule (oper)
      in one buffer and new (admin) command in the other. Each 22-bit
      fetch command consists of a 14-bit fetch count (14 MSB’s) and an
      8-bit priority fetch allow (8 LSB’s) that will be applied for the
      fetch count time in wireside clocks. Driver process each of the
      sched-entry in the offload command and update the fetch RAM.
      Driver configures duration in sched-entry into the fetch count
      and Gate mask into the priority fetch bits of the RAM. Then
      configures the CPTS EST function generator to activate the
      schedule. Currently driver supports only 2 buffer configuration
      which means driver supports a max cycle time of ~8 msec.
      
      CPSW supports a configurable number of priority queues (up to 8)
      and needs to be switched to this mode from the default round
      robin mode before EST can be offloaded. User configures
      these through ethtool commands (-L for changing number of
      queues and --set-priv-flags to disable round robin mode).
      Driver doesn't enable EST if pf_p0_rx_ptype_rrobin privat flag
      is set. The flag is common for all ports, and so can't be just
      overridden by taprio configuration w/o user involvement.
      Command fails if pf_p0_rx_ptype_rrobin is already set in the
      driver.
      
      Scheds (commands) configuration depends on interface speed so
      driver translates the duration to the fetch count based on
      link speed. Each schedule can be constructed with several
      command entries in fetch RAM  depending on interval. For example
      if each sched has timer interval < ~130us on 1000 Mb link then
      each sched consumes one command and have 1:1 mapping. When
      Ethernet link goes down, driver purge the configuration if link
      is down for more than 1 second.
      
      The patch allows to update the timer and scheds memory only if it's
      really needed, and skip cases required the user to stop timer by
      configuring only shceds memory.
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8127224c
    • Ivan Khoronzhuk's avatar
      ethernet: ti: am65-cpts: add routines to support taprio offload · ec008fa2
      Ivan Khoronzhuk authored
      TAPRIO/EST offload support in CPSW2G requires EST scheduler
      function enabled in CPTS. So this patch add a function to
      set cycle time for EST scheduler.  It also add a function for
      getting time in ns of PHC clock for taprio qdisc configuration.
      Mostly to verify if timer update is needed or to get actual
      state of oper/admin schedule.
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec008fa2
  2. 14 May, 2020 10 commits