1. 20 Dec, 2019 25 commits
  2. 19 Dec, 2019 12 commits
  3. 18 Dec, 2019 3 commits
    • Jose Abreu's avatar
      net: stmmac: tc: Fix TAPRIO division operation · a1ec57c0
      Jose Abreu authored
      For ARCHs that don't support 64 bits division we need to use the
      helpers.
      
      Fixes: b60189e0 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
      Signed-off-by: default avatarJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1ec57c0
    • David S. Miller's avatar
      Merge branch 'ETS-qdisc' · 6bff0017
      David S. Miller authored
      Petr Machata says:
      
      ====================
      Add a new Qdisc, ETS
      
      The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
      transmission selection algorithms: strict priority, credit-based shaper,
      ETS (bandwidth sharing), and vendor-specific. All these have their
      corresponding knobs in DCB. But DCB does not have interfaces to configure
      RED and ECN, unlike Qdiscs.
      
      In the Qdisc land, strict priority is implemented by PRIO. Credit-based
      transmission selection algorithm can then be modeled by having e.g. TBF or
      CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
      placing a DRR Qdisc under the last PRIO band.
      
      The problem with this approach is that DRR on its own, as well as the
      combination of PRIO and DRR, are tricky to configure and tricky to offload
      to 802.1Qaz-compliant hardware. This is due to several reasons:
      
      - As any classful Qdisc, DRR supports adding classifiers to decide in which
        class to enqueue packets. Unlike PRIO, there's however no fallback in the
        form of priomap. A way to achieve classification based on packet priority
        is e.g. like this:
      
          # tc filter add dev swp1 root handle 1: \
      		basic match 'meta(priority eq 0)' flowid 1:10
      
        Expressing the priomap in this manner however forces drivers to deep dive
        into the classifier block to parse the individual rules.
      
        A possible solution would be to extend the classes with a "defmap" a la
        split / defmap mechanism of CBQ, and introduce this as a last resort
        classification. However, unlike priomap, this doesn't have the guarantee
        of covering all priorities. Traffic whose priority is not covered is
        dropped by DRR as unclassified. But ASICs tend to implement dropping in
        the ACL block, not in scheduling pipelines. The need to treat these
        configurations correctly (if only to decide to not offload at all)
        complicates a driver.
      
        It's not clear how to retrofit priomap with all its benefits to DRR
        without changing it beyond recognition.
      
      - The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
        all ETS TCs as a last resort. Switch ASICs that support ETS at all are
        likely to handle ETS traffic this way as well. However, the Linux model
        is more generic, allowing the DRR block in any band. Drivers would need
        to be careful to handle this case correctly, otherwise the offloaded
        model might not match the slow-path one.
      
        In a similar vein, PRIO and DRR need to agree on the list of priorities
        assigned to DRR. This is doubly problematic--the user needs to take care
        to keep the two in sync, and the driver needs to watch for any holes in
        DRR coverage and treat the traffic correctly, as discussed above.
      
        Note that at the time that DRR Qdisc is added, it has no classes, and
        thus any priorities assigned to that PRIO band are not covered. Thus this
        case is surprisingly rather common, and needs to be handled gracefully by
        the driver.
      
      - Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
        below it, it is not immediately clear which TC the class represents. This
        is unlike PRIO with its straightforward classid scheme. When DRR is
        combined with PRIO, the relationship between classes and TCs gets even
        more murky.
      
        This is a problem for users as well: the TC mapping is rather important
        for (devlink) shared buffer configuration and (ethtool) counters.
      
      So instead, this patch set introduces a new Qdisc, which is based on
      802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
      needs to specify how many bands there are, how many are strict and how many
      are ETS, quanta for the latter, and priomap.
      
      The new Qdisc operates like the PRIO / DRR combo would when configured as
      per the standard. The strict classes, if any, are tried for traffic first.
      When there's no traffic in any of the strict queues, the ETS ones (if any)
      are treated in the same way as in DRR.
      
      The chosen interface makes the overall system both reasonably easy to
      configure, and reasonably easy to offload. The extra code to support ETS in
      mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
      lines is bona fide new business logic.
      
      Credit-based shaping transmission selection algorithm can be configured by
      adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
      similar effect as well). As a non-work-conserving Qdisc, CBS can't be
      hooked under the ETS bands. This is detected and handled identically to DRR
      Qdisc at runtime. Note that offloading CBS is not subject of this patchset.
      
      The patchset proceeds in four stages:
      
      - Patches #1-#3 are cleanups.
      - Patches #4 and #5 contain the new Qdisc.
      - Patches #6 and #7 update mlxsw to offload the new Qdisc.
      - Patches #8-#10 add selftests for ETS.
      
      Examples:
      
      - Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
      
      - Tweak quantum of one of the classes of the previous Qdisc:
      
          # tc class ch dev swp1 classid 1:4 ets quantum 1000
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
          # tc class ch dev swp1 classid 1:3 ets quantum 1000
          Error: Strict bands do not have a configurable quantum.
      
      - Purely strict Qdisc with 1:1 mapping between priorities and TCs:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets strict 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      - Use "bands" to specify number of bands explicitly. Underspecified bands
        are implicitly ETS and their quantum is taken from MTU. The following
        thus gives each band the same weight:
      
          # tc qdisc add dev swp1 root handle 1: \
      	ets bands 8 priomap 7 6 5 4 3 2 1 0
          # tc qdisc sh dev swp1
          qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
      
      v2:
      - This addresses points raised by David Miller.
      - Patch #4:
          - sch_ets.c: Add a comment with description of the Qdisc and the
            dequeuing algorithm.
          - Kconfig: Add a high-level description to the help blurb.
      
      v1:
      - No changes, first upstream submission after RFC.
      
      v3 (internal):
      - This addresses review from Jiri Pirko.
      - Patch #3:
          - Rename to _HR_ instead of to _HIERARCHY_.
      - Patch #4:
          - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
          - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
            _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
          - sch_ets.c: Update to reflect the above changes. Add a new policy,
            ets_class_policy, which is used when parsing class changes.
            Currently that policy is the same as the quanta policy, but that
            might change.
          - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
            caller that makes use of it.
          - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
            attribute instead of returning an extack.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
            function in this patch already. Drop the weight computation.
          - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
            pass for the abovementioned "weights".
          - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
            __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
            directly from mlxsw_sp_setup_tc_prio().
          - Update to follow the _HIERARCHY_ -> _HR_ renaming.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
            weight computation removal are now done in a previous patch.
          - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
            earlier in the function.
      - Patch #3 (iproute2):
          - Add an example output to the commit message.
          - tc-ets.8: Fix output of two examples.
          - tc-ets.8: Describe default values of "bands", "quanta".
          - q_ets.c: A number of fixes in error messages.
          - q_ets.c: Comment formatting: /*padding*/ -> /* padding */
          - q_ets.c: parse_nbands: Move duplicate checking to callers.
          - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.
      
      v2 (internal):
      - This addresses review from Ido Schimmel and comments from Alexander
        Kushnarov.
      - Patch #2:
          - s/coment/comment in the commit message.
      - Patch #4:
          - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
          - ets_class_find(): RXTify
      - Patch #3 (iproute2):
          - tc-ets.8: some spelling fixes
          - tc-ets.8: add another example
          - tc.8: add an ETS to "CLASSFUL QDISCS" section
      
      v1 (internal):
      - This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
        by Alexander Petrovskiy and myself, and other improvements.
      - Patch #2:
          - Expand the explanation with an explicit example.
      - Patch #4:
          - Kconfig: s/sch_drr/sch_ets/
          - sch_ets: Reorder includes to be in alphabetical order
          - sch_ets: ets_quantum_parse(): Rename the return-pointer argument
            from pquantum to quantum, and use it directly, not going through a
            local temporary.
          - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
            argument "quanta" from an array to a pointer.
          - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
          - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
            __nla_validate_nested directly instead of nl80211_validate_nested().
          - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
            instead of returning an extack.
          - sch_ets: ets_qdisc_change(): Make the last band the default one for
            unmentioned priomap priorities.
          - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
            band notified its ETS parent.
          - sch_ets: When ungrafting, add the newly-created invisible FIFO to
            the Qdisc hash
      - Patch #5:
          - pkt_cls.h: Note that quantum=0 signifies a strict band.
          - Fix error path handling when ets_offload_dump() fails.
      - Patch #6:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
            "quanta" and "priomap" from arrays to pointers.
      - Patch #7:
          - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
            "weights" from an array to a pointer.
      - Patch #9:
          - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
          - Adjust the whole suite to allow testing of traffic classifiers
            in addition to testing priomap.
      - Patch #10:
          - Add a number of new tests to test default priomap band, overlarge
            number of bands, zeroes in quanta, and altogether missing quanta.
      - Patch #1 (iproute2):
          - State motivation for inclusion of this patch in the patcheset in the
            commit message.
      - Patch #3 (iproute2):
          - tc-ets.8: it is now December
          - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
          - tc-ets.8: s/flow/band in explanation of quantum
          - tc-ets.8: explain what happens with priorities not covered by priomap
          - tc-ets.8: default priomap band is now the last one
          - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
            priomap and quanta.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bff0017
    • Petr Machata's avatar
      selftests: qdiscs: Add test coverage for ETS Qdisc · 82c664b6
      Petr Machata authored
      Add TDC coverage for the new ETS Qdisc.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82c664b6