1. 28 Jul, 2021 24 commits
    • Vladimir Oltean's avatar
      net: bridge: switchdev: treat local FDBs the same as entries towards the bridge · 52e4bec1
      Vladimir Oltean authored
      Currently the following script:
      
      1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      2. ip link set swp2 up && ip link set swp2 master br0
      3. ip link set swp3 up && ip link set swp3 master br0
      4. ip link set swp4 up && ip link set swp4 master br0
      5. bridge vlan del dev swp2 vid 1
      6. bridge vlan del dev swp3 vid 1
      7. ip link set swp4 nomaster
      8. ip link set swp3 nomaster
      
      produces the following output:
      
      [  641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2
      
      [ swp2, swp3 and br0 all have the same MAC address, the one listed above ]
      
      In short, this happens because the number of FDB entry additions
      notified to switchdev is unbalanced with the number of deletions.
      
      At step 1, the bridge has a random MAC address. At step 2, the
      br_fdb_replay of swp2 receives this initial MAC address. Then the bridge
      inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it
      notifies switchdev (only swp2 at this point) of the deletion of the
      random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB
      entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1).
      
      During step 7:
      
      del_nbp
      -> br_fdb_delete_by_port(br, p, vid=0, do_all=1);
         -> fdb_delete_local(br, p, f);
      
      br_fdb_delete_by_port() deletes all entries towards the ports,
      regardless of vid, because do_all is 1.
      
      fdb_delete_local() has logic to migrate local FDB entries deleted from
      one port to another port which shares the same MAC address and is in the
      same VLAN, or to the bridge device itself. This migration happens
      without notifying switchdev of the deletion on the old port and the
      addition on the new one, just fdb->dst is changed and the added_by_user
      flag is cleared.
      
      In the example above, the del_nbp(swp4) causes the
      "addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4
      that existed up until then to be migrated directly towards the bridge
      (fdb->dst == NULL). This is because it cannot be migrated to any of the
      other ports (swp2 and swp3 are not in VLAN 1).
      
      After the migration to br0 takes place, swp4 requests a deletion replay
      of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now
      point towards the bridge, a deletion of it is replayed. There was just
      a prior addition of this address, so the switchdev driver deletes this
      entry.
      
      Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and
      switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1".
      But it can't because it no longer has it, so it returns -ENOENT.
      
      There are other possibilities to trigger this issue, but this is by far
      the simplest to explain.
      
      To fix this, we must avoid the situation where the addition of an FDB
      entry is notified to switchdev as a local entry on a port, and the
      deletion is notified on the bridge itself.
      
      Considering that the 2 types of FDB entries are completely equivalent
      and we cannot have the same MAC address as a local entry on 2 bridge
      ports, or on a bridge port and pointing towards the bridge at the same
      time, it makes sense to hide away from switchdev completely the fact
      that a local FDB entry is associated with a given bridge port at all.
      Just say that it points towards the bridge, it should make no difference
      whatsoever to the switchdev driver and should even lead to a simpler
      overall implementation, will less cases to handle.
      
      This also avoids any modification at all to the core bridge driver, just
      what is reported to switchdev changes. With the local/permanent entries
      on bridge ports being already reported to user space, it is hard to
      believe that the bridge behavior can change in any backwards-incompatible
      way such as making all local FDB entries point towards the bridge.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52e4bec1
    • Vladimir Oltean's avatar
      net: bridge: switchdev: replay the entire FDB for each port · b4454bc6
      Vladimir Oltean authored
      Currently when a switchdev port joins a bridge, we replay all FDB
      entries pointing towards that port or towards the bridge.
      
      However, this is insufficient in certain situations:
      
      (a) DSA, through its assisted_learning_on_cpu_port logic, snoops
          dynamically learned FDB entries on foreign interfaces.
          These are FDB entries that are pointing neither towards the newly
          joined switchdev port, nor towards the bridge. So these addresses
          would be missed when joining a bridge where a foreign interface has
          already learned some addresses, and they would also linger on if the
          DSA port leaves the bridge before the foreign interface forgets them.
          None of this happens if we replay the entire FDB when the port joins.
      
      (b) There is a desire to treat local FDB entries on a port (i.e. the
          port's termination MAC address) identically to FDB entries pointing
          towards the bridge itself. More details on the reason behind this in
          the next patch. The point is that this cannot be done given the
          current structure of br_fdb_replay() in this situation:
            ip link set swp0 master br0  # br0 inherits its MAC address from swp0
            ip link set swp1 master br0
          What is desirable is that when swp1 joins the bridge, br_fdb_replay()
          also notifies swp1 of br0's MAC address, but this won't in fact
          happen because the MAC address of br0 does not have fdb->dst == NULL
          (it doesn't point towards the bridge), but it has fdb->dst == swp0.
          So our current logic makes it impossible for that address to be
          replayed. But if we dump the entire FDB instead of just the entries
          with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC
          address of br0 will be replayed too, which is what we need.
      
      A natural question arises: say there is an FDB entry to be replayed,
      like a MAC address dynamically learned on a foreign interface that
      belongs to a bridge where no switchdev port has joined yet. If 10
      switchdev ports belonging to the same driver join this bridge, one by
      one, won't every port get notified 10 times of the foreign FDB entry,
      amounting to a total of 100 notifications for this FDB entry in the
      switchdev driver?
      
      Well, yes, but this is where the "void *ctx" argument for br_fdb_replay
      is useful: every port of the switchdev driver is notified whenever any
      other port requests an FDB replay, but because the replay was initiated
      by a different port, its context is different from the initiating port's
      context, so it ignores those replays.
      
      So the foreign FDB entry will be installed only 10 times, once per port.
      This is done so that the following 4 code paths are always well balanced:
      (a) addition of foreign FDB entry is replayed when port joins bridge
      (b) deletion of foreign FDB entry is replayed when port leaves bridge
      (c) addition of foreign FDB entry is notified to all ports currently in bridge
      (c) deletion of foreign FDB entry is notified to all ports currently in bridge
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4454bc6
    • David S. Miller's avatar
      Merge branch 'bnxt_en-ptp' · 1159da64
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: PTP enhancements
      
      This series adds two PTP enhancements.  This first one is to register
      the PHC during probe time and keep it registered whether it is in
      ifup or ifdown state.  It will get unregistered and possibly
      reregistered if the firmware PTP capability changes after firmware
      reset.  The second one is to add the 1PPS (one pulse per second)
      feature to support input/output of the 1PPS signal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1159da64
    • Pavan Chebbi's avatar
      bnxt_en: Log if an invalid signal detected on TSIO pin · abf90ac2
      Pavan Chebbi authored
      FW can report to driver via ASYNC event if it encountered an
      invalid signal on any TSIO PIN. Driver will log this event
      for the user to take corrective action.
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarArvind Susarla <arvind.susarla@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abf90ac2
    • Pavan Chebbi's avatar
      bnxt_en: Event handler for PPS events · 099fdeda
      Pavan Chebbi authored
      Once the PPS pins are configured, the FW can report
      PPS values using ASYNC event. This patch adds the
      ASYNC event handler and subsequent reporting of the
      events to kernel.
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      099fdeda
    • Pavan Chebbi's avatar
      bnxt_en: 1PPS functions to configure TSIO pins · 9e518f25
      Pavan Chebbi authored
      Application will send ioctls to set/clear PPS pin functions
      based on user input. This patch implements the driver
      callbacks that will configure the TSIO pins using firmware
      commands. After firmware reset, the TSIO pins will be reconfigured
      again.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e518f25
    • Pavan Chebbi's avatar
      bnxt_en: 1PPS support for 5750X family chips · caf3eedb
      Pavan Chebbi authored
      1PPS (One Pulse Per Second) is a signal generated either
      by the NIC PHC or an external timing source.
      Integrating the support to configure and use 1PPS using
      the TSIO pins along with PTP timestamps will add Grand
      Master capability to the 5750X family chipsets.
      
      This patch initializes the driver data structures and
      registers the 1PPS with kernel, based on the TSIO pins'
      capability in the hardware. This will create a /dev/ppsX
      device which applications can use to receive PPS events.
      
      Later patches will define functions to configure and use
      the pins.
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caf3eedb
    • Michael Chan's avatar
      bnxt_en: Do not read the PTP PHC during chip reset · 30e96f48
      Michael Chan authored
      During error recovery or hot firmware upgrade, the chip may be under
      reset and the PHC register read cycles may cause completion timeouts.
      Check that the chip is not under reset condition before proceeding
      to read the PHC by checking the flag BNXT_STATE_IN_FW_RESET.  We also
      need to take the ptp_lock before we set this flag to prevent race
      conditions.
      
      We need this logic because the PHC now will stay registered after
      bnxt_close().
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e96f48
    • Michael Chan's avatar
      bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one() · a521c8a0
      Michael Chan authored
      It was pointed out by Richard Cochran that registering the PHC during
      probe is better than during ifup, so move bnxt_ptp_init() back to
      bnxt_init_one().  In order to work correctly after firmware reset which
      may result in PTP config. changes, we modify bnxt_ptp_init() to return
      if the PHC has been registered earlier.  If PTP is no longer supported
      by the new firmware, we will unregister the PHC and clean up.
      
      This partially reverts:
      
      d7859afb ("bnxt_en: Move bnxt_ptp_init() to bnxt_open()")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a521c8a0
    • David S. Miller's avatar
      Merge branch 'fec-next' · 63caca1e
      David S. Miller authored
      Joakim Zhang says:
      
      ====================
      net: fec: add support for i.MX8MQ and i.MX8QM
      
      This patch set adds supports for i.MX8MQ and i.MX8QM, both of them extend new features.
      
      ChangeLogs:
      V1->V2:
      	* rebase on schema binding, and update dts compatible string.
      	* use generic ethernet controller property for MAC internal RGMII clock delay
      	  rx-internal-delay-ps and tx-internal-delay-ps
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63caca1e
    • Joakim Zhang's avatar
      arm64: dts: imx8qxp: add "fsl,imx8qm-fec" compatible string for FEC · 987e1b96
      Joakim Zhang authored
      Add "fsl,imx8qm-fec" compatible string for FEC to support new feature
      (RGMII delayed clock).
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      987e1b96
    • Joakim Zhang's avatar
      arm64: dts: imx8m: add "fsl,imx8mq-fec" compatible string for FEC · a758dee8
      Joakim Zhang authored
      Add "fsl,imx8mq-fec" compatible string for FEC to support new feature
      (IEEE 802.3az EEE standard).
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a758dee8
    • Fugang Duan's avatar
      net: fec: add MAC internal delayed clock feature support · fc539459
      Fugang Duan authored
      i.MX8QM ENET IP version support timing specification that MAC
      integrate clock delay in RGMII mode, the delayed TXC/RXC as an
      alternative option to work well with various PHYs.
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc539459
    • Fugang Duan's avatar
      net: fec: add eee mode tx lpi support · b82f8c3f
      Fugang Duan authored
      The i.MX8MQ ENET version support IEEE802.3az eee mode, add
      eee mode tx lpi enable to support ethtool interface.
      
      usage:
      1. set sleep and wake timer to 5ms:
      ethtool --set-eee eth0 eee on tx-lpi on tx-timer 5000
      2. check the eee mode:
      ~# ethtool --show-eee eth0
      EEE Settings for eth0:
              EEE status: enabled - active
              Tx LPI: 5000 (us)
              Supported EEE link modes:  100baseT/Full
                                         1000baseT/Full
              Advertised EEE link modes:  100baseT/Full
                                          1000baseT/Full
              Link partner advertised EEE link modes:  100baseT/Full
      
      Note: For realtime case and IEEE1588 ptp case, it should disable
      EEE mode.
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b82f8c3f
    • Fugang Duan's avatar
      net: fec: add imx8mq and imx8qm new versions support · 947240eb
      Fugang Duan authored
      The ENET of imx8mq and imx8qm are basically the same as imx6sx,
      but they have new features support based on imx6sx, like:
      - imx8mq: supports IEEE 802.3az EEE standard.
      - imx8qm: supports RGMII mode delayed clock.
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      947240eb
    • Joakim Zhang's avatar
      dt-bindings: net: fsl,fec: add RGMII internal clock delay · df11b807
      Joakim Zhang authored
      Add RGMII internal clock delay for FEC controller.
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df11b807
    • Joakim Zhang's avatar
      dt-bindings: net: fsl,fec: update compatible items · 5d886947
      Joakim Zhang authored
      Add more compatible items for i.MX8/8M platforms.
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d886947
    • Peilin Ye's avatar
      tc-testing: Add control-plane selftest for skbmod SKBMOD_F_ECN option · 68f98848
      Peilin Ye authored
      Recently we added a new option, SKBMOD_F_ECN, to tc-skbmod(8).  Add a
      control-plane selftest for it.
      
      Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
      support", as well as iproute2 patch "tc/skbmod: Introduce SKBMOD_F_ECN
      option".
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68f98848
    • Peilin Ye's avatar
      net/sched: act_skbmod: Add SKBMOD_F_ECN option support · 56af5e74
      Peilin Ye authored
      Currently, when doing rate limiting using the tc-police(8) action, the
      easiest way is to simply drop the packets which exceed or conform the
      configured bandwidth limit.  Add a new option to tc-skbmod(8), so that
      users may use the ECN [1] extension to explicitly inform the receiver
      about the congestion instead of dropping packets "on the floor".
      
      The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
      headers are used to represent different ECN states [2]:
      
      	0b00: "Non ECN-Capable Transport", Non-ECT
      	0b10: "ECN Capable Transport", ECT(0)
      	0b01: "ECN Capable Transport", ECT(1)
      	0b11: "Congestion Encountered", CE
      
      As an example:
      
      	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
      		matchall action skbmod ecn
      
      Doing the above marks all ECT(0) and ECT(1) packets as CE.  It does NOT
      affect Non-ECT or non-IP packets.  In the tc-police scenario mentioned
      above, users may pipe a tc-police action and a tc-skbmod "ecn" action
      together to achieve ECN-based rate limiting.
      
      For TCP connections, upon receiving a CE packet, the receiver will respond
      with an ECE packet, asking the sender to reduce their congestion window.
      However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
      our implementation does not touch or care about L4 headers.
      
      The updated tc-skbmod SYNOPSIS looks like the following:
      
      	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
      
      Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
      command.  Trying to use more than one of them at a time is considered
      undefined behavior; pipe multiple tc-skbmod commands together instead.
      "set" and "swap" only affect Ethernet packets, while "ecn" only affects
      IPv{4,6} packets.
      
      It is also worth mentioning that, in theory, the same effect could be
      achieved by piping a "police" action and a "bpf" action using the
      bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
      user, thus impractical.
      
      Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".
      
      [1] https://datatracker.ietf.org/doc/html/rfc3168
      [2] https://en.wikipedia.org/wiki/Explicit_Congestion_NotificationReviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56af5e74
    • Yang Yingliang's avatar
      nfp: flower-ct: fix error return code in nfp_fl_ct_add_offload() · d80f6d66
      Yang Yingliang authored
      If nfp_tunnel_add_ipv6_off() fails, it should return error code
      in nfp_fl_ct_add_offload().
      
      Fixes: 5a2b9304 ("nfp: flower-ct: compile match sections of flow_payload")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d80f6d66
    • David S. Miller's avatar
      Merge branch 'devlink-register' · 3bdc7066
      David S. Miller authored
      Leon Romanovsky says:
      
      ====================
      Remove duplicated devlink registration check
      
      Changelog:
      v1:
       * Added two new patches that remove registration field from mlx5 and ti drivers.
      v0: https://lore.kernel.org/lkml/ed7bbb1e4c51dd58e6035a058e93d16f883b09ce.1627215829.git.leonro@nvidia.com
      
      --------------------------------------------------------------------
      
      Both registered flag and devlink pointer are set at the same time
      and indicate the same thing - devlink/devlink_port are ready. Instead
      of checking ->registered use devlink pointer as an indication.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bdc7066
    • Leon Romanovsky's avatar
      devlink: Remove duplicated registration check · d7907a2b
      Leon Romanovsky authored
      Both registered flag and devlink pointer are set at the same time
      and indicate the same thing - devlink/devlink_port are ready. Instead
      of checking ->registered use devlink pointer as an indication.
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7907a2b
    • Leon Romanovsky's avatar
      net/mlx5: Don't rely on always true registered field · 35f69867
      Leon Romanovsky authored
      Devlink is an integral part of mlx5 driver and all flows ensure that
      devlink_*_register() will success. That makes the ->registered check
      an obsolete.
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35f69867
    • Leon Romanovsky's avatar
      net: ti: am65-cpsw-nuss: fix wrong devlink release order · acf34954
      Leon Romanovsky authored
      The commit that introduced devlink support released devlink resources in
      wrong order, that made an unwind flow to be asymmetrical. In addition,
      the am65-cpsw-nuss used internal to devlink core field - registered.
      
      In order to fix the unwind flow and remove such access to the
      registered field, rewrite the code to call devlink_port_unregister only
      on registered ports.
      
      Fixes: 58356eb3 ("net: ti: am65-cpsw-nuss: Add devlink support")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acf34954
  2. 27 Jul, 2021 16 commits