1. 29 Jun, 2021 20 commits
    • Jonathan Lemon's avatar
      ptp: Set lookup cookie when creating a PTP PPS source. · 8602e40f
      Jonathan Lemon authored
      When creating a PTP device, the configuration block allows
      creation of an associated PPS device.  However, there isn't
      any way to associate the two devices after creation.
      
      Set the PPS cookie, so pps_lookup_dev(ptp) performs correctly.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8602e40f
    • David S. Miller's avatar
      Merge branch 'inet-sk_error-tracers' · c79fa61c
      David S. Miller authored
      Alexander Aring says:
      
      ====================
      net: sock: add tracers for inet socket errors
      
      this patch series introduce tracers for sk_error_report socket callback
      calls. The use-case is that a user space application can monitor them
      and making an own heuristic about bad peer connections even over a
      socket lifetime. To make a specific example it could be use in the Linux
      cluster world to fence a "bad" behaving node. For now it's okay to only
      trace inet sockets. Other socket families can introduce their own tracers
      easily.
      
      Example output with trace-cmd:
      
      <idle>-0     [003]   201.799437: inet_sk_error_report: family=AF_INET protocol=IPPROTO_TCP sport=21064 dport=38941 saddr=192.168.122.57 daddr=192.168.122.251 saddrv6=::ffff:192.168.122.57 daddrv6=::ffff:192.168.122.251 error=104
      
      - Alex
      
      changes since v2:
      
      - change "sk.sk_error_report(&ipc->sk);" to "sk_error_report(&ipc->sk);"
        in net/qrtr/qrtr.c
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c79fa61c
    • Alexander Aring's avatar
      net: sock: add trace for socket errors · e6a3e443
      Alexander Aring authored
      This patch will add tracers to trace inet socket errors only. A user
      space monitor application can track connection errors indepedent from
      socket lifetime and do additional handling. For example a cluster
      manager can fence a node if errors occurs in a specific heuristic.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6a3e443
    • Alexander Aring's avatar
      net: sock: introduce sk_error_report · e3ae2365
      Alexander Aring authored
      This patch introduces a function wrapper to call the sk_error_report
      callback. That will prepare to add additional handling whenever
      sk_error_report is called, for example to trace socket errors.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ae2365
    • David S. Miller's avatar
      Merge branch 'dsa-rx-filtering' · 7f4e5c5b
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      RX filtering in DSA
      
      This is my fourth stab (identical to the third one except sent as
      non-RFC) at creating a list of unicast and multicast addresses that the
      DSA CPU ports must trap. I am reusing a lot of Tobias's work which he
      submitted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210116012515.3152-1-tobias@waldekranz.com/
      
      My additions to Tobias' work come in the form of taking some care that
      additions and removals of host addresses are properly balanced, so that
      we can do reference counting on them for cross-chip setups and multiple
      bridges spanning the same switch (I am working on an NXP board where
      both are real requirements).
      
      During the last attempted submission of multiple CPU ports for DSA:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210410133454.4768-1-ansuelsmth@gmail.com/
      
      it became clear that the concept of multiple CPU ports would not be
      compatible with the idea of address learning on those CPU ports (when
      those CPU ports are statically assigned to user ports, not in a LAG)
      unless the switch supports complete FDB isolation, which most switches
      do not. So DSA needs to manage in software all addresses that are
      installed on the CPU port(s), which is what this patch set does.
      
      Compared to all earlier attempts, this series does not fiddle with how
      DSA operates the ports in standalone mode at all, just when bridged.
      We need to sort that out properly, then any optimization that comes in
      standalone mode (i.e. IFF_UNICAST_FLT) can come later.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f4e5c5b
    • Vladimir Oltean's avatar
      net: dsa: replay the local bridge FDB entries pointing to the bridge dev too · 63c51453
      Vladimir Oltean authored
      When we join a bridge that already has some local addresses pointing to
      itself, we do not get those notifications. Similarly, when we leave that
      bridge, we do not get notifications for the deletion of those entries.
      The only switchdev notifications we get are those of entries added while
      the DSA port is enslaved to the bridge.
      
      This makes use cases such as the following work properly (with the
      number of additions and removals properly balanced):
      
      ip link add br0 type bridge
      ip link add br1 type bridge
      ip link set br0 address 00:01:02:03:04:05
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 up
      ip link set swp1 up
      ip link set swp0 master br0
      ip link set swp1 master br1
      ip link set br0 up
      ip link set br1 up
      ip link del br1 # 00:01:02:03:04:05 still installed on the CPU port
      ip link del br0 # 00:01:02:03:04:05 finally removed from the CPU port
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63c51453
    • Vladimir Oltean's avatar
      net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev · 4bed397c
      Vladimir Oltean authored
      When
      (a) "dev" is a bridge port which the DSA switch tree offloads, but is
          otherwise not a dsa slave (such as a LAG netdev), or
      (b) "dev" is the bridge net device itself
      
      then strange things happen to the dev_hold/dev_put pair:
      dsa_schedule_work() will still be called with a DSA port that offloads
      that netdev, but dev_hold() will be called on the non-DSA netdev.
      Then the "if" condition in dsa_slave_switchdev_event_work() does not
      pass, because "dev" is not a DSA netdev, so dev_put() is not called.
      
      This results in the simple fact that we have a reference counting
      mismatch on the "dev" net device.
      
      This can be seen when we add support for host addresses installed on the
      bridge net device.
      
      ip link add br1 type bridge
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 master br1
      ip link del br1
      [  968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5
      
      It seems foolish to do penny pinching and not add the net_device pointer
      in the dsa_switchdev_event_work structure, so let's finally do that.
      As an added bonus, when we start offloading local entries pointing
      towards the bridge, these will now properly appear as 'offloaded' in
      'bridge fdb' (this was not possible before, because 'dev' was assumed to
      only be a DSA net device):
      
      00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent
      00:01:02:03:04:05 dev br0 offload master br0 permanent
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bed397c
    • Vladimir Oltean's avatar
      net: dsa: include fdb entries pointing to bridge in the host fdb list · 81a619f7
      Vladimir Oltean authored
      The bridge supports a legacy way of adding local (non-forwarded) FDB
      entries, which works on an individual port basis:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      As well as a new way, added by Roopa Prabhu in commit 3741873b
      ("bridge: allow adding of fdb entries pointing to the bridge device"):
      
      bridge fdb add dev br0 00:01:02:03:04:05 self local
      
      The two commands are functionally equivalent, except that the first one
      produces an entry with fdb->dst == swp0, and the other an entry with
      fdb->dst == NULL. The confusing part, though, is that even if fdb->dst
      is swp0 for the 'local on port' entry, that destination is not used.
      
      Nonetheless, the idea is that the bridge has reference counting for
      local entries, and local entries pointing towards the bridge are still
      'as local' as local entries for a port.
      
      The bridge adds the MAC addresses of the interfaces automatically as
      FDB entries with is_local=1. For the MAC address of the ports, fdb->dst
      will be equal to the port, and for the MAC address of the bridge,
      fdb->dst will point towards the bridge (i.e. be NULL). Therefore, if the
      MAC address of the bridge is not inherited from either of the physical
      ports, then we must explicitly catch local FDB entries emitted towards
      the br0, otherwise we'll miss the MAC address of the bridge (and, of
      course, any entry with 'bridge add dev br0 ... self local').
      Co-developed-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81a619f7
    • Tobias Waldekranz's avatar
      net: dsa: include bridge addresses which are local in the host fdb list · 10fae4ac
      Tobias Waldekranz authored
      The bridge automatically creates local (not forwarded) fdb entries
      pointing towards physical ports with their interface MAC addresses.
      For switchdev, the significance of these fdb entries is the exact
      opposite of that of non-local entries: instead of sending these frame
      outwards, we must send them inwards (towards the host).
      
      NOTE: The bridge's own MAC address is also "local". If that address is
      not shared with any port, the bridge's MAC is not be added by this
      functionality - but the following commit takes care of that case.
      
      NOTE 2: We mark these addresses as host-filtered regardless of the value
      of ds->assisted_learning_on_cpu_port. This is because, as opposed to the
      speculative logic done for dynamic address learning on foreign
      interfaces, the local FDB entries are rather fixed, so there isn't any
      risk of them migrating from one bridge port to another.
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10fae4ac
    • Vladimir Oltean's avatar
      net: dsa: sync static FDB entries on foreign interfaces to hardware · 3068d466
      Vladimir Oltean authored
      DSA is able to install FDB entries towards the CPU port for addresses
      which were dynamically learnt by the software bridge on foreign
      interfaces that are in the same bridge with a DSA switch interface.
      Since this behavior is opportunistic, it is guarded by the
      "assisted_learning_on_cpu_port" property which can be enabled by drivers
      and is not done automatically (since certain switches may support
      address learning of packets coming from the CPU port).
      
      But if those FDB entries added on the foreign interfaces are static
      (added by the user) instead of dynamically learnt, currently DSA does
      not do anything (and arguably it should).
      
      Because static FDB entries are not supposed to move on their own, there
      is no downside in reusing the "assisted_learning_on_cpu_port" logic to
      sync static FDB entries to the DSA CPU port unconditionally, even if
      assisted_learning_on_cpu_port is not requested by the driver.
      
      For example, this situation:
      
         br0
         / \
      swp0 dummy0
      
      $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master static
      
      Results in DSA adding an entry in the hardware FDB, pointing this
      address towards the CPU port.
      
      The same is true for entries added to the bridge itself, e.g:
      
      $ bridge fdb add 02:00:de:ad:00:01 dev br0 vlan 1 self local
      
      (except that right now, DSA still ignores 'local' FDB entries, this will
      be changed in a later patch)
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3068d466
    • Vladimir Oltean's avatar
      net: dsa: install the host MDB and FDB entries in the master's RX filter · 26ee7b06
      Vladimir Oltean authored
      If the DSA master implements strict address filtering, then the unicast
      and multicast addresses kept by the DSA CPU ports should be synchronized
      with the address lists of the DSA master.
      
      Note that we want the synchronization of the master's address lists even
      if the DSA switch doesn't support unicast/multicast database operations,
      on the premises that the packets will be flooded to the CPU in that
      case, and we should still instruct the master to receive them. This is
      why we do the dev_uc_add() etc first, even if dsa_port_notify() returns
      -EOPNOTSUPP. In turn, dev_uc_add() and friends return error only if
      memory allocation fails, so it is probably ok to check and propagate
      that error code and not just ignore it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26ee7b06
    • Vladimir Oltean's avatar
      net: dsa: reference count the FDB addresses at the cross-chip notifier level · 3f6e32f9
      Vladimir Oltean authored
      The same concerns expressed for host MDB entries are valid for host FDBs
      just as well:
      
      - in the case of multiple bridges spanning the same switch chip, deleting
        a host FDB entry that belongs to one bridge will result in breakage to
        the other bridge
      - not deleting FDB entries across DSA links means that the switch's
        hardware tables will eventually run out, given enough wear&tear
      
      So do the same thing and introduce reference counting for CPU ports and
      DSA links using the same data structures as we have for MDB entries.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6e32f9
    • Vladimir Oltean's avatar
      net: dsa: introduce a separate cross-chip notifier type for host FDBs · 3dc80afc
      Vladimir Oltean authored
      DSA treats some bridge FDB entries by trapping them to the CPU port.
      Currently, the only class of such entries are FDB addresses learnt by
      the software bridge on a foreign interface. However there are many more
      to be added:
      
      - FDB entries with the is_local flag (for termination) added by the
        bridge on the user ports (typically containing the MAC address of the
        bridge port)
      - FDB entries pointing towards the bridge net device (for termination).
        Typically these contain the MAC address of the bridge net device.
      - Static FDB entries installed on a foreign interface that is in the
        same bridge with a DSA user port.
      
      The reason why a separate cross-chip notifier for host FDBs is justified
      compared to normal FDBs is the same as in the case of host MDBs: the
      cross-chip notifier matching function in switch.c should avoid
      installing these entries on routing ports that route towards the
      targeted switch, but not towards the CPU. This is required in order to
      have proper support for H-like multi-chip topologies.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dc80afc
    • Vladimir Oltean's avatar
      net: dsa: reference count the MDB entries at the cross-chip notifier level · 161ca59d
      Vladimir Oltean authored
      Ever since the cross-chip notifiers were introduced, the design was
      meant to be simplistic and just get the job done without worrying too
      much about dangling resources left behind.
      
      For example, somebody installs an MDB entry on sw0p0 in this daisy chain
      topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
      sw1p4 and sw2p4.
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
      
      Then the same person deletes that MDB entry. The cross-chip notifier for
      deletion only matches sw0p0:
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
      
      Why?
      
      Because the DSA links are 'trunk' ports, if we just go ahead and delete
      the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
      entries when they are still needed. Just consider the fact that somebody
      does:
      
      - add a multicast MAC address towards sw0p0 [ via the cross-chip
        notifiers it gets installed on the DSA links too ]
      - add the same multicast MAC address towards sw0p1 (another port of that
        same switch)
      - delete the same multicast MAC address from sw0p0.
      
      At this point, if we deleted the MAC address from the DSA links, it
      would be flooded, even though there is still an entry on switch 0 which
      needs it not to.
      
      So that is why deletions only match the targeted source port and nothing
      on DSA links. Of course, dangling resources means that the hardware
      tables will eventually run out given enough additions/removals, but hey,
      at least it's simple.
      
      But there is a bigger concern which needs to be addressed, and that is
      our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
      object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
      on the upstream port, and a similar thing happens on deletion:
      dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
      upstream port.
      
      When there are 2 VLAN-unaware bridges spanning the same switch (which is
      a use case DSA proudly supports), each bridge will install its own
      SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
      emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
      user ports enslaved to br0 and the user ports enslaved to br1. Not good.
      The host-trapped multicast addresses installed by br1 will be deleted
      when any state changes in br0 (IGMP timers expire, or ports leave, etc).
      
      To avoid this, we could of course go the route of the zero-sum game and
      delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
      design is to just admit that on shared ports like DSA links and CPU
      ports, we should be reference counting calls, even if this consumes some
      dynamic memory which DSA has traditionally avoided. On the flip side,
      the hardware tables of switches are limited in size, so it would be good
      if the OS managed them properly instead of having them eventually
      overflow.
      
      To address the memory usage concern, we only apply the refcounting of
      MDB entries on ports that are really shared (CPU ports and DSA links)
      and not on user ports. In a typical single-switch setup, this means only
      the CPU port (and the host MDB entries are not that many, really).
      
      The name of the newly introduced data structures (dsa_mac_addr) is
      chosen in such a way that will be reusable for host FDB entries (next
      patch).
      
      With this change, we can finally have the same matching logic for the
      MDB additions and deletions, as well as for their host-trapped variants.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161ca59d
    • Vladimir Oltean's avatar
      net: dsa: introduce a separate cross-chip notifier type for host MDBs · b8e997c4
      Vladimir Oltean authored
      Commit abd49535 ("net: dsa: execute dsa_switch_mdb_add only for
      routing port in cross-chip topologies") does a surprisingly good job
      even for the SWITCHDEV_OBJ_ID_HOST_MDB use case, where DSA simply
      translates a switchdev object received on dp into a cross-chip notifier
      for dp->cpu_dp.
      
      To visualize how that works, imagine the daisy chain topology below and
      consider a SWITCHDEV_OBJ_ID_HOST_MDB object emitted on sw2p0. How does
      the cross-chip notifier know to match on all the right ports (sw0p4, the
      dedicated CPU port, sw1p4, an upstream DSA link, and sw2p4, another
      upstream DSA link)?
      
                                                      |
             sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
          [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
          [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
          [  user ] [  user ] [  user ] [  user ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
      
      The answer is simple: the dedicated CPU port of sw2p0 is sw0p4, and
      dsa_routing_port returns the upstream port for all switches.
      
      That is fine, but there are other topologies where this does not work as
      well. There are trees with "H" topologies in the wild, where there are 2
      or more switches with DSA links between them, but every switch has its
      dedicated CPU port. For these topologies, it seems stupid for the neighbor
      switches to install an MDB entry on the routing port, since these
      multicast addresses are fundamentally different than the usual ones we
      support (and that is the justification for this patch, to introduce the
      concept of a termination plane multicast MAC address, as opposed to a
      forwarding plane multicast MAC address).
      
      For example, when a SWITCHDEV_OBJ_ID_HOST_MDB would get added to sw0p0,
      without this patch, it would get treated as a regular port MDB on sw0p2
      and it would match on the ports below (including the sw1p3 routing port).
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [   x   ] [       ] [       ] [       ]
      
      With the patch, the host MDB notifier on sw0p0 matches only on the local
      switch, which is what we want for a termination plane address.
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [       ] [       ] [       ] [       ]
      
      Name this new matching function "dsa_switch_host_address_match" since we
      will be reusing it soon for host FDB entries as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8e997c4
    • Vladimir Oltean's avatar
      net: dsa: introduce dsa_is_upstream_port and dsa_switch_is_upstream_of · 63609c8f
      Vladimir Oltean authored
      In preparation for the new cross-chip notifiers for host addresses,
      let's introduce some more topology helpers which we are going to use to
      discern switches that are in our path towards the dedicated CPU port
      from switches that aren't.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63609c8f
    • Vladimir Oltean's avatar
      net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del · b117e1e8
      Vladimir Oltean authored
      We want to add reference counting for FDB entries in cross-chip
      topologies, and in order for that to have any chance of working and not
      be unbalanced (leading to entries which are never deleted), we need to
      ensure that higher layers are sane, because if they aren't, it's garbage
      in, garbage out.
      
      For example, if we add a bridge FDB entry twice, the bridge properly
      errors out:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      RTNETLINK answers: File exists
      
      However, the same thing cannot be said about the bridge bypass
      operations:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ echo $?
      0
      
      But one 'bridge fdb del' is enough to remove the entry, no matter how
      many times it was added.
      
      The bridge bypass operations are impossible to maintain in these
      circumstances and lack of support for reference counting the cross-chip
      notifiers is holding us back from making further progress, so just drop
      support for them. The only way left for users to install static bridge
      FDB entries is the proper one, using the "master static" flags.
      
      With this change, rtnl_fdb_add() falls back to calling
      ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of
      dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare
      IFF_UNICAST_FLT, this results in us going to promiscuous mode:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      [   28.206743] device swp0 entered promiscuous mode
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      RTNETLINK answers: File exists
      
      So even if it does not completely fail, there is at least some indication
      that it is behaving differently from before, and closer to user space
      expectations, I would argue (the lack of a "local|static" specifier
      defaults to "local", or "host-only", so dev_uc_add() is a reasonable
      default implementation). If the generic implementation of .ndo_fdb_add
      provided by Vlad Yasevich is a proof of anything, it only proves that
      the implementation provided by DSA was always wrong, by not looking at
      "ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB
      entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local"
      flag which means that the FDB entry points towards the host). It all
      used to mean the same thing to DSA.
      
      Update the documentation so that the users are not confused about what's
      going on.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b117e1e8
    • Vladimir Oltean's avatar
      net: bridge: allow br_fdb_replay to be called for the bridge device · f851a721
      Vladimir Oltean authored
      When a port joins a bridge which already has local FDB entries pointing
      to the bridge device itself, we would like to offload those, so allow
      the "dev" argument to be equal to the bridge too. The code already does
      what we need in that case.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f851a721
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: send FDB notifications for host addresses · 6eb38bf8
      Tobias Waldekranz authored
      Treat addresses added to the bridge itself in the same way as regular
      ports and send out a notification so that drivers may sync it down to
      the hardware FDB.
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eb38bf8
    • Vladimir Oltean's avatar
      net: bridge: use READ_ONCE() and WRITE_ONCE() compiler barriers for fdb->dst · 3e19ae7c
      Vladimir Oltean authored
      Annotate the writer side of fdb->dst:
      
      - fdb_create()
      - br_fdb_update()
      - fdb_add_entry()
      - br_fdb_external_learn_add()
      
      with WRITE_ONCE() and the reader side:
      
      - br_fdb_test_addr()
      - br_fdb_update()
      - fdb_fill_info()
      - fdb_add_entry()
      - fdb_delete_by_addr_and_port()
      - br_fdb_external_learn_add()
      - br_switchdev_fdb_notify()
      
      with compiler barriers such that the readers do not attempt to reload
      fdb->dst multiple times, leading to potentially different destination
      ports when the fdb entry is updated concurrently.
      
      This is especially important in read-side sections where fdb->dst is
      used more than once, but let's convert all accesses for the sake of
      uniformity.
      Suggested-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e19ae7c
  2. 28 Jun, 2021 20 commits
    • David S. Miller's avatar
      Merge branch 'do_once_lite' · 84fe7399
      David S. Miller authored
      Tanner Love says:
      
      ====================
      net: update netdev_rx_csum_fault() print dump only once
      
      First patch implements DO_ONCE_LITE to abstract uses of the ".data.once"
      trick. It is defined in its own, new header file  -- rather than
      alongside the existing DO_ONCE in include/linux/once.h -- because
      include/linux/once.h includes include/linux/jump_label.h, and this
      causes the build to break for some architectures if
      include/linux/once.h is included in include/linux/printk.h or
      include/asm-generic/bug.h.
      
      Second patch uses DO_ONCE_LITE in netdev_rx_csum_fault to print dump
      only once.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84fe7399
    • Tanner Love's avatar
      net: update netdev_rx_csum_fault() print dump only once · 127d7355
      Tanner Love authored
      Printing this stack dump multiple times does not provide additional
      useful information, and consumes time in the data path. Printing once
      is sufficient.
      
      Changes
        v2: Format indentation properly
      Signed-off-by: default avatarTanner Love <tannerlove@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      127d7355
    • Tanner Love's avatar
      once: implement DO_ONCE_LITE for non-fast-path "do once" functionality · a358f406
      Tanner Love authored
      Certain uses of "do once" functionality reside outside of fast path,
      and so do not require jump label patching via static keys, making
      existing DO_ONCE undesirable in such cases.
      
      Replace uses of __section(".data.once") with DO_ONCE_LITE(_IF)?
      
      This patch changes the return values of xfs_printk_once, printk_once,
      and printk_deferred_once. Before, they returned whether the print was
      performed, but now, they always return true. This is okay because the
      return values of the following macros are entirely ignored throughout
      the kernel:
      - xfs_printk_once
      - xfs_warn_once
      - xfs_notice_once
      - xfs_info_once
      - printk_once
      - pr_emerg_once
      - pr_alert_once
      - pr_crit_once
      - pr_err_once
      - pr_warn_once
      - pr_notice_once
      - pr_info_once
      - pr_devel_once
      - pr_debug_once
      - printk_deferred_once
      - orc_warn
      
      Changes
      v3:
        - Expand commit message to explain why changing return values of
          xfs_printk_once, printk_once, printk_deferred_once is benign
      v2:
        - Fix i386 build warnings
      Signed-off-by: default avatarTanner Love <tannerlove@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a358f406
    • Nathan Chancellor's avatar
      net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe() · b74ef9f9
      Nathan Chancellor authored
      Clang warns:
      
      drivers/net/ethernet/microchip/sparx5/sparx5_main.c:760:29: warning:
      variable 'mac_addr' is uninitialized when used here [-Wuninitialized]
              if (of_get_mac_address(np, mac_addr)) {
                                         ^~~~~~~~
      drivers/net/ethernet/microchip/sparx5/sparx5_main.c:669:14: note:
      initialize the variable 'mac_addr' to silence this warning
              u8 *mac_addr;
                          ^
                           = NULL
      1 warning generated.
      
      mac_addr is only used to store the value retrieved from
      of_get_mac_address(), which is then copied into the base_mac member of
      the sparx5 struct using ether_addr_copy(). It is easier to just use the
      base_mac address directly, which avoids the warning and the extra copy.
      
      Fixes: 3cfa11ba ("net: sparx5: add the basic sparx5 driver")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1413Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b74ef9f9
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix dynamic access to L2 Address Lookup table for SJA1110 · 74e7feff
      Vladimir Oltean authored
      The SJA1105P/Q/R/S and SJA1110 may have the same layout for the command
      to read/write/search for L2 Address Lookup entries, but as explained in
      the comments at the beginning of the sja1105_dynamic_config.c file, the
      command portion of the buffer is at the end, and we need to obtain a
      pointer to it by adding the length of the entry to the buffer.
      
      Alas, the length of an L2 Address Lookup entry is larger in SJA1110 than
      it is for SJA1105P/Q/R/S, so we need to create a common helper to access
      the command buffer, and this receives as argument the length of the
      entry buffer.
      
      Fixes: 3e77e59b ("net: dsa: sja1105: add support for the SJA1110 switch family")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74e7feff
    • Horatiu Vultur's avatar
      net: bridge: mrp: Update the Test frames for MRA · f7458934
      Horatiu Vultur authored
      According to the standard IEC 62439-2, in case the node behaves as MRA
      and needs to send Test frames on ring ports, then these Test frames need
      to have an Option TLV and a Sub-Option TLV which has the type AUTO_MGR.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7458934
    • David S. Miller's avatar
      Merge tag 'for-net-next-2021-06-28' of... · f0305e73
      David S. Miller authored
      Merge tag 'for-net-next-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth-next pull request for net-next:
      
       - Add support for QCA_ROME device (0cf3:e500) and RTL8822CE
       - Update management interface revision to 21
       - Use of incluse language
       - Proper handling of HCI_LE_Advertising_Set_Terminated event
       - Recovery handing of HCI ncmd=0
       - Various memory fixes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0305e73
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · e1289cfb
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2021-06-28
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 37 non-merge commits during the last 12 day(s) which contain
      a total of 56 files changed, 394 insertions(+), 380 deletions(-).
      
      The main changes are:
      
      1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
      
      2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
      
      3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
      
      4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
      
      5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
      
      6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
      
      7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
      
      8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
      
      9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
      
      10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
      
      11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
      
      12) Fix up bpfilter startup log-level to info level, from Gary Lin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1289cfb
    • Andreas Roeseler's avatar
      ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages · 1fd07f33
      Andreas Roeseler authored
      This patch builds off of commit 2b246b25
      and adds functionality to respond to ICMPV6 PROBE requests.
      
      Add icmp_build_probe function to construct PROBE requests for both
      ICMPV4 and ICMPV6.
      
      Modify icmpv6_rcv to detect ICMPV6 PROBE messages and call the
      icmpv6_echo_reply handler.
      
      Modify icmpv6_echo_reply to build a PROBE response message based on the
      queried interface.
      
      This patch has been tested using a branch of the iputils git repo which can
      be found here: https://github.com/Juniper-Clinic-2020/iputils/tree/probe-requestSigned-off-by: default avatarAndreas Roeseler <andreas.a.roeseler@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fd07f33
    • Yang Yingliang's avatar
      net: sparx5: fix error return code in sparx5_register_notifier_blocks() · 83300c69
      Yang Yingliang authored
      Fix to return a negative error code from the error handling
      case instead of 0, as done elsewhere in this function.
      
      Fixes: d6fce514 ("net: sparx5: add switching support")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83300c69
    • Yang Yingliang's avatar
      net: sparx5: fix return value check in sparx5_create_targets() · 8f4c38f7
      Yang Yingliang authored
      In case of error, the function devm_ioremap() returns NULL pointer
      not ERR_PTR(). The IS_ERR() test in the return value check should
      be replaced with NULL test.
      
      Fixes: 3cfa11ba ("net: sparx5: add the basic sparx5 driver")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f4c38f7
    • Yang Yingliang's avatar
      net: sparx5: check return value after calling platform_get_resource() · f00af5cc
      Yang Yingliang authored
      It will cause null-ptr-deref if platform_get_resource() returns NULL,
      we need check the return value.
      
      Fixes: 3cfa11ba ("net: sparx5: add the basic sparx5 driver")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f00af5cc
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4bec3cea
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-06-26
      
      This series provides small updates to mlx5 driver.
      
      1) Increase hairpin buffer size
      
      2) Improve peroformance in SF allocation
      
      3) Add IPsec support to uplink representor
      
      4) Add stats for number of deleted kTLS TX offloaded connections
      
      5) Add support for flow sampler in SW steering
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bec3cea
    • David S. Miller's avatar
      Merge branch 'bridge-replay-helpers' · 3095f512
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Cleanup for the bridge replay helpers
      
      This patch series brings some improvements to the logic added to the
      bridge and DSA to handle LAG interfaces sandwiched between a bridge and
      a DSA switch port.
      
              br0
              /  \
             /    \
           bond0  swp2
           /  \
          /    \
        swp0  swp1
      
      In particular, it ensures that the switchdev object additions and
      deletions are well balanced per physical port. This is important for
      future work in the area of offloading local bridge FDB entries to
      hardware in the context of DSA requesting a replay of those entries at
      bridge join time (this will be submitted in a future patch series).
      Due to some difficulty ensuring that the deletion of local FDB entries
      pointing towards the bridge device itself is notified to switchdev in
      time (before the switchdev port disconnects from the bridge), this is
      potentially still not the final form in which the replay helpers will
      exist. I'm thinking about moving from the pull mode (in which DSA
      requests the replay) to a push mode (in which the bridge initiates the
      replay). Nonetheless, these preliminary changes are needed either way.
      
      The patch series also addresses some feedback from Nikolai which is long
      overdue by now (sorry).
      
      Switchdev driver maintainers were deliberately omitted due to the
      trivial nature of the driver changes (just a function prototype).
      
      Changes in v2:
      - fix build issue in patch 4 (function prototype mismatch)
      - move switchdev object unsync to the NETDEV_PRECHANGEUPPER code path
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3095f512
    • Vladimir Oltean's avatar
      net: dsa: replay a deletion of switchdev objects for ports leaving a bridged LAG · 74918945
      Vladimir Oltean authored
      When a DSA switch port leaves a bonding interface that is under a
      bridge, there might be dangling switchdev objects on that port left
      behind, because the bridge is not aware that its lower interface (the
      bond) changed state in any way.
      
      Call the bridge replay helpers with adding=false before changing
      dp->bridge_dev to NULL, because we need to simulate to
      dsa_slave_port_obj_del() that these notifications were emitted by the
      bridge.
      
      We add this hook to the NETDEV_PRECHANGEUPPER event handler, because
      we are calling into switchdev (and the __switchdev_handle_port_obj_del
      fanout helpers expect the upper/lower adjacency lists to still be valid)
      and PRECHANGEUPPER is the last moment in time when they still are.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74918945
    • Vladimir Oltean's avatar
      net: dsa: refactor the prechangeupper sanity checks into a dedicated function · 4ede74e7
      Vladimir Oltean authored
      We need to add more logic to the DSA NETDEV_PRECHANGEUPPER event
      handler, more exactly we need to request an unsync of switchdev objects.
      In order to fit more code, refactor the existing logic into a helper.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ede74e7
    • Vladimir Oltean's avatar
      net: bridge: allow the switchdev replay functions to be called for deletion · 7e8c1858
      Vladimir Oltean authored
      When a switchdev port leaves a LAG that is a bridge port, the switchdev
      objects and port attributes offloaded to that port are not removed:
      
      ip link add br0 type bridge
      ip link add bond0 type bond mode 802.3ad
      ip link set swp0 master bond0
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 nomaster
      
      VLAN 100 will remain installed on swp0 despite it going into standalone
      mode, because as far as the bridge is concerned, nothing ever happened
      to its bridge port.
      
      Let's extend the bridge vlan, fdb and mdb replay functions to take a
      'bool adding' argument, and make DSA and ocelot call the replay
      functions with 'adding' as false from the switchdev unsync path, for the
      switch port that leaves the bridge.
      
      Note that this patch in itself does not salvage anything, because in the
      current pull mode of operation, DSA still needs to call the replay
      helpers with adding=false. This will be done in another patch.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e8c1858
    • Vladimir Oltean's avatar
      net: bridge: constify variables in the replay helpers · bdf123b4
      Vladimir Oltean authored
      Some of the arguments and local variables for the newly added switchdev
      replay helpers can be const, so let's make them so.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdf123b4
    • Vladimir Oltean's avatar
      net: bridge: ignore switchdev events for LAG ports which didn't request replay · 0d2cfbd4
      Vladimir Oltean authored
      There is a slight inconvenience in the switchdev replay helpers added
      recently, and this is when:
      
      ip link add br0 type bridge
      ip link add bond0 type bond
      ip link set bond0 master br0
      bridge vlan add dev bond0 vid 100
      ip link set swp0 master bond0
      ip link set swp1 master bond0
      
      Since the underlying driver (currently only DSA) asks for a replay of
      VLANs when swp0 and swp1 join the LAG because it is bridged, what will
      happen is that DSA will try to react twice on the VLAN event for swp0.
      This is not really a huge problem right now, because most drivers accept
      duplicates since the bridge itself does, but it will become a problem
      when we add support for replaying switchdev object deletions.
      
      Let's fix this by adding a blank void *ctx in the replay helpers, which
      will be passed on by the bridge in the switchdev notifications. If the
      context is NULL, everything is the same as before. But if the context is
      populated with a valid pointer, the underlying switchdev driver
      (currently DSA) can use the pointer to 'see through' the bridge port
      (which in the example above is bond0) and 'know' that the event is only
      for a particular physical port offloading that bridge port, and not for
      all of them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d2cfbd4
    • Vladimir Oltean's avatar
      net: switchdev: add a context void pointer to struct switchdev_notifier_info · 69bfac96
      Vladimir Oltean authored
      In the case where the driver asks for a replay of a certain type of
      event (port object or attribute) for a bridge port that is a LAG, it may
      do so because this port has just joined the LAG.
      
      But there might already be other switchdev ports in that LAG, and it is
      preferable that those preexisting switchdev ports do not act upon the
      replayed event.
      
      The solution is to add a context to switchdev events, which is NULL most
      of the time (when the bridge layer initiates the call) but which can be
      set to a value controlled by the switchdev driver when a replay is
      requested. The driver can then check the context to figure out if all
      ports within the LAG should act upon the switchdev event, or just the
      ones that match the context.
      
      We have to modify all switchdev_handle_* helper functions as well as the
      prototypes in the drivers that use these helpers too, because these
      helpers hide the underlying struct switchdev_notifier_info from us and
      there is no way to retrieve the context otherwise.
      
      The context structure will be populated and used in later patches.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69bfac96