1. 12 Dec, 2021 6 commits
    • Vladimir Oltean's avatar
      net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q · d38049bb
      Vladimir Oltean authored
      When the ocelot-8021q driver was converted to deferred xmit as part of
      commit 8d5f7954 ("net: dsa: felix: break at first CPU port during
      init and teardown"), the deferred implementation was deliberately made
      subtly different from what sja1105 has.
      
      The implementation differences lied on the following observations:
      
      - There might be a race between these two lines in tag_sja1105.c:
      
             skb_queue_tail(&sp->xmit_queue, skb_get(skb));
             kthread_queue_work(sp->xmit_worker, &sp->xmit_work);
      
        and the skb dequeue logic in sja1105_port_deferred_xmit(). For
        example, the xmit_work might be already queued, however the work item
        has just finished walking through the skb queue. Because we don't
        check the return code from kthread_queue_work, we don't do anything if
        the work item is already queued.
      
        However, nobody will take that skb and send it, at least until the
        next timestampable skb is sent. This creates additional (and
        avoidable) TX timestamping latency.
      
        To close that race, what the ocelot-8021q driver does is it doesn't
        keep a single work item per port, and a skb timestamping queue, but
        rather dynamically allocates a work item per packet.
      
      - It is also unnecessary to have more than one kthread that does the
        work. So delete the per-port kthread allocations and replace them with
        a single kthread which is global to the switch.
      
      This change brings the two implementations in line by applying those
      observations to the sja1105 driver as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d38049bb
    • Vladimir Oltean's avatar
      net: dsa: sja1105: let deferred packets time out when sent to ports going down · a3d74295
      Vladimir Oltean authored
      This code is not necessary and complicates the conversion of this driver
      to tagger-owned memory. If there is a PTP packet that is sent
      concurrently with the port getting disabled, the deferred xmit mechanism
      is robust enough to time out when it sees that it hasn't been delivered,
      and recovers.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3d74295
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot: convert to tagger-owned data · 35d97680
      Vladimir Oltean authored
      The felix driver makes very light use of dp->priv, and the tagger is
      effectively stateless. dp->priv is practically only needed to set up a
      callback to perform deferred xmit of PTP and STP packets using the
      ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no
      use of dp->priv, although this driver sets up dp->priv irrespective of
      actual tagging protocol in use).
      
      struct felix_port (what used to be pointed to by dp->priv) is removed
      and replaced with a two-sided structure. The public side of this
      structure, visible to the switch driver, is ocelot_8021q_tagger_data.
      The private side is ocelot_8021q_tagger_private, and the latter
      structure physically encapsulates the former. The public half of the
      tagger data structure can be accessed through a helper of the same name
      (ocelot_8021q_tagger_data) which also sanity-checks the protocol
      currently in use by the switch. The public/private split was requested
      by Andrew Lunn.
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d97680
    • Vladimir Oltean's avatar
      net: dsa: introduce tagger-owned storage for private and shared data · dc452a47
      Vladimir Oltean authored
      Ansuel is working on register access over Ethernet for the qca8k switch
      family. This requires the qca8k tagging protocol driver to receive
      frames which aren't intended for the network stack, but instead for the
      qca8k switch driver itself.
      
      The dp->priv is currently the prevailing method for passing data back
      and forth between the tagging protocol driver and the switch driver.
      However, this method is riddled with caveats.
      
      The DSA design allows in principle for any switch driver to return any
      protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be
      modified to do just that. But in the current design, the memory behind
      dp->priv has to be allocated by the switch driver, so if the tagging
      protocol is paired to an unexpected switch driver, we may end up in NULL
      pointer dereferences inside the kernel, or worse (a switch driver may
      allocate dp->priv according to the expectations of a different tagger).
      
      The latter possibility is even more plausible considering that DSA
      switches can dynamically change tagging protocols in certain cases
      (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends
      itself to mistakes that are all too easy to make.
      
      This patch proposes that the tagging protocol driver should manage its
      own memory, instead of relying on the switch driver to do so.
      After analyzing the different in-tree needs, it can be observed that the
      required tagger storage is per switch, therefore a ds->tagger_data
      pointer is introduced. In principle, per-port storage could also be
      introduced, although there is no need for it at the moment. Future
      changes will replace the current usage of dp->priv with ds->tagger_data.
      
      We define a "binding" event between the DSA switch tree and the tagging
      protocol. During this binding event, the tagging protocol's ->connect()
      method is called first, and this may allocate some memory for each
      switch of the tree. Then a cross-chip notifier is emitted for the
      switches within that tree, and they are given the opportunity to fix up
      the tagger's memory (for example, they might set up some function
      pointers that represent virtual methods for consuming packets).
      Because the memory is owned by the tagger, there exists a ->disconnect()
      method for the tagger (which is the place to free the resources), but
      there doesn't exist a ->disconnect() method for the switch driver.
      This is part of the design. The switch driver should make minimal use of
      the public part of the tagger data, and only after type-checking it
      using the supplied "proto" argument.
      
      In the code there are in fact two binding events, one is the initial
      event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip
      notifier chains aren't initialized, so we call each switch's connect()
      method by hand. Then there is dsa_tree_bind_tag_proto() during
      dsa_tree_change_tag_proto(), and here we have an old protocol and a new
      one. We first connect to the new one before disconnecting from the old
      one, to simplify error handling a bit and to ensure we remain in a valid
      state at all times.
      Co-developed-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc452a47
    • Tobias Waldekranz's avatar
      net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices · e0068620
      Tobias Waldekranz authored
      In a typical mv88e6xxx switch tree like this:
      
        CPU
         |    .----.
      .--0--. | .--0--.
      | sw0 | | | sw1 |
      '-1-2-' | '-1-2-'
          '---'
      
      If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0
      still needs to add a crosschip PVT entry for the virtual DSA device
      assigned to represent the bridge.
      
      Fixes: ce5df689 ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0068620
    • xu xin's avatar
      net: Enable neighbor sysctls that is save for userns root · 8c8b7aa7
      xu xin authored
      Inside netns owned by non-init userns, sysctls about ARP/neighbor is
      currently not visible and configurable.
      
      For the attributes these sysctls correspond to, any modifications make
      effects on the performance of networking(ARP, especilly) only in the
      scope of netns, which does not affect other netns.
      
      Actually, some tools via netlink can modify these attribute. iproute2 is
      an example. see as follows:
      
      $ unshare -ur -n
      $ cat /proc/sys/net/ipv4/neigh/lo/retrans_time
      cat: can't open '/proc/sys/net/ipv4/neigh/lo/retrans_time': No such file
      or directory
      $ ip ntable show dev lo
      inet arp_cache
          dev lo
          refcnt 1 reachable 19494 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
      
      inet6 ndisc_cache
          dev lo
          refcnt 1 reachable 42394 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
      $ ip ntable change name arp_cache dev <if> retrans 2000
      inet arp_cache
          dev lo
          refcnt 1 reachable 22917 base_reachable 30000 retrans 2000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
      
      inet6 ndisc_cache
          dev lo
          refcnt 1 reachable 35524 base_reachable 30000 retrans 1000
          gc_stale 60000 delay_probe 5000 queue 101
          app_probes 0 ucast_probes 3 mcast_probes 3
          anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Acked-by: default avatarJoanne Koong <joannekoong@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c8b7aa7
  2. 11 Dec, 2021 13 commits
  3. 10 Dec, 2021 21 commits