1. 05 Aug, 2021 25 commits
    • Gustavo A. R. Silva's avatar
      net/ipv4/ipv6: Replace one-element arraya with flexible-array members · db243b79
      Gustavo A. R. Silva authored
      There is a regular need in the kernel to provide a way to declare having
      a dynamically sized set of trailing elements in a structure. Kernel code
      should always use “flexible array members”[1] for these cases. The older
      style of one-element or zero-length arrays should no longer be used[2].
      
      Use an anonymous union with a couple of anonymous structs in order to
      keep userspace unchanged and refactor the related code accordingly:
      
      $ pahole -C group_filter net/ipv4/ip_sockglue.o
      struct group_filter {
      	union {
      		struct {
      			__u32      gf_interface_aux;     /*     0     4 */
      
      			/* XXX 4 bytes hole, try to pack */
      
      			struct __kernel_sockaddr_storage gf_group_aux; /*     8   128 */
      			/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
      			__u32      gf_fmode_aux;         /*   136     4 */
      			__u32      gf_numsrc_aux;        /*   140     4 */
      			struct __kernel_sockaddr_storage gf_slist[1]; /*   144   128 */
      		};                                       /*     0   272 */
      		struct {
      			__u32      gf_interface;         /*     0     4 */
      
      			/* XXX 4 bytes hole, try to pack */
      
      			struct __kernel_sockaddr_storage gf_group; /*     8   128 */
      			/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
      			__u32      gf_fmode;             /*   136     4 */
      			__u32      gf_numsrc;            /*   140     4 */
      			struct __kernel_sockaddr_storage gf_slist_flex[0]; /*   144     0 */
      		};                                       /*     0   144 */
      	};                                               /*     0   272 */
      
      	/* size: 272, cachelines: 5, members: 1 */
      	/* last cacheline: 16 bytes */
      };
      
      $ pahole -C compat_group_filter net/ipv4/ip_sockglue.o
      struct compat_group_filter {
      	union {
      		struct {
      			__u32      gf_interface_aux;     /*     0     4 */
      			struct __kernel_sockaddr_storage gf_group_aux __attribute__((__aligned__(4))); /*     4   128 */
      			/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
      			__u32      gf_fmode_aux;         /*   132     4 */
      			__u32      gf_numsrc_aux;        /*   136     4 */
      			struct __kernel_sockaddr_storage gf_slist[1] __attribute__((__aligned__(4))); /*   140   128 */
      		} __attribute__((__packed__)) __attribute__((__aligned__(4)));                     /*     0   268 */
      		struct {
      			__u32      gf_interface;         /*     0     4 */
      			struct __kernel_sockaddr_storage gf_group __attribute__((__aligned__(4))); /*     4   128 */
      			/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
      			__u32      gf_fmode;             /*   132     4 */
      			__u32      gf_numsrc;            /*   136     4 */
      			struct __kernel_sockaddr_storage gf_slist_flex[0] __attribute__((__aligned__(4))); /*   140     0 */
      		} __attribute__((__packed__)) __attribute__((__aligned__(4)));                     /*     0   140 */
      	} __attribute__((__aligned__(1)));               /*     0   268 */
      
      	/* size: 268, cachelines: 5, members: 1 */
      	/* forced alignments: 1 */
      	/* last cacheline: 12 bytes */
      } __attribute__((__packed__));
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      [1] https://en.wikipedia.org/wiki/Flexible_array_member
      [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
      
      Link: https://github.com/KSPP/linux/issues/79
      Link: https://github.com/KSPP/linux/issues/109Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db243b79
    • David S. Miller's avatar
      Merge branch 'bridge-ioctl-fixes' · d15040a3
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: fix recent ioctl changes
      
      These are three fixes for the recent bridge removal of ndo_do_ioctl
      done by commit ad2f99ae ("net: bridge: move bridge ioctls out of
      .ndo_do_ioctl"). Patch 01 fixes a deadlock of the new bridge ioctl
      hook lock and rtnl by taking a netdev reference and always taking the
      bridge ioctl lock first then rtnl from within the bridge hook.
      Patch 02 fixes old_deviceless() bridge calls device name argument, and
      patch 03 checks in dev_ifsioc()'s SIOCBRADD/DELIF cases if the netdevice is
      actually a bridge before interpreting its private ptr as net_bridge.
      
      Patch 01 was tested by running old bridge-utils commands with lockdep
      enabled. Patch 02 was tested again by using bridge-utils and using the
      respective ioctl calls on a "up" bridge device. Patch 03 was tested by
      using the addif ioctl on a non-bridge device (e.g. loopback).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d15040a3
    • Nikolay Aleksandrov's avatar
      net: core: don't call SIOCBRADD/DELIF for non-bridge devices · 9384eacd
      Nikolay Aleksandrov authored
      Commit ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
      changed SIOCBRADD/DELIF to use bridge's ioctl hook (br_ioctl_hook)
      without checking if the target netdevice is actually a bridge which can
      cause crashes and generally interpreting other devices' private pointers
      as net_bridge pointers.
      
      Crash example (lo - loopback):
      $ brctl addif lo ens16
       BUG: kernel NULL pointer dereference, address: 000000000000059898
       #PF: supervisor read access in kernel modede
       #PF: error_code(0x0000) - not-present pagege
       PGD 0 P4D 0 ^Ac
       Oops: 0000 [#1] SMP NOPTI
       CPU: 2 PID: 1376 Comm: brctl Kdump: loaded Tainted: G        W         5.14.0-rc3+ #405
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
       RIP: 0010:add_del_if+0x1f/0x7c [bridge]
       Code: 80 bf 1b a0 41 5c e9 c0 3c 03 e1 0f 1f 44 00 00 41 55 41 54 41 89 f4 be 0c 00 00 00 55 48 89 fd 53 48 8b 87 88 00 00 00 89 d3 <4c> 8b a8 98 05 00 00 49 8b bd d0 00 00 00 e8 17 d7 f3 e0 84 c0 74
       RSP: 0018:ffff888109d97cb0 EFLAGS: 00010202^Ac
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff888101239bc0
       RBP: ffff888101239bc0 R08: 0000000000000001 R09: 0000000000000000
       R10: ffff888109d97cd8 R11: 00000000000000a3 R12: 0000000000000012
       R13: 0000000000000000 R14: ffff888101239bc0 R15: ffff888109d97e10
       FS:  00007fc1e365b540(0000) GS:ffff88822be80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000598 CR3: 0000000106506000 CR4: 00000000000006e0
       Call Trace:
        br_ioctl_stub+0x7c/0x441 [bridge]
        br_ioctl_call+0x6d/0x8a
        dev_ifsioc+0x325/0x4e8
        dev_ioctl+0x46b/0x4e1
        sock_do_ioctl+0x7b/0xad
        sock_ioctl+0x2de/0x2f2
        vfs_ioctl+0x1e/0x2b
        __do_sys_ioctl+0x63/0x86
        do_syscall_64+0xcb/0xf2
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fc1e3589427
       Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
       RSP: 002b:00007ffc8d501d38 EFLAGS: 00000202 ORIG_RAX: 000000000000001010
       RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fc1e3589427
       RDX: 00007ffc8d501d60 RSI: 00000000000089a3 RDI: 0000000000000003
       RBP: 00007ffc8d501d60 R08: 0000000000000000 R09: fefefeff77686d74
       R10: fffffffffffff8f9 R11: 0000000000000202 R12: 00007ffc8d502e06
       R13: 00007ffc8d502e06 R14: 0000000000000000 R15: 0000000000000000
       Modules linked in: bridge stp llc bonding ipv6 virtio_net [last unloaded: llc]^Ac
       CR2: 0000000000000598
      
      Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com
      Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9384eacd
    • Nikolay Aleksandrov's avatar
      net: bridge: fix ioctl old_deviceless bridge argument · cbd7ad29
      Nikolay Aleksandrov authored
      Commit ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
      changed the source of the argument copy in bridge's old_deviceless() from
      args[1] (user ptr to device name) to uarg (ptr to ioctl arguments) causing
      wrong device name to be used.
      
      Example (broken, bridge exists but is up):
      $ brctl delbr bridge
      bridge bridge doesn't exist; can't delete it
      
      Example (working):
      $ brctl delbr bridge
      bridge bridge is still up; can't delete it
      
      Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbd7ad29
    • Nikolay Aleksandrov's avatar
      net: bridge: fix ioctl locking · 893b1958
      Nikolay Aleksandrov authored
      Before commit ad2f99ae ("net: bridge: move bridge ioctls out of
      .ndo_do_ioctl") the bridge ioctl calls were divided in two parts:
      one was deviceless called by sock_ioctl and didn't expect rtnl to be held,
      the other was with a device called by dev_ifsioc() and expected rtnl to be
      held. After the commit above they were united in a single ioctl stub, but
      it didn't take care of the locking expectations.
      For sock_ioctl now we acquire  (1) br_ioctl_mutex, (2) rtnl
      and for dev_ifsioc we acquire  (1) rtnl,           (2) br_ioctl_mutex
      
      The fix is to get a refcnt on the netdev for dev_ifsioc calls and drop rtnl
      then to reacquire it in the bridge ioctl stub after br_ioctl_mutex has
      been acquired. That will avoid playing locking games and make the rules
      straight-forward: we always take br_ioctl_mutex first, and then rtnl.
      
      Reported-by: syzbot+34fe5894623c4ab1b379@syzkaller.appspotmail.com
      Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      893b1958
    • Gustavo A. R. Silva's avatar
      net/ipv4: Revert use of struct_size() helper · 4167a960
      Gustavo A. R. Silva authored
      Revert the use of structr_size() and stay with IP_MSFILTER_SIZE() for
      now, as in this case, the size of struct ip_msfilter didn't change with
      the addition of the flexible array imsf_slist_flex[]. So, if we use
      struct_size() we will be allocating and calculating the size of
      struct ip_msfilter with one too many items for imsf_slist_flex[].
      
      We might use struct_size() in the future, but for now let's stay
      with IP_MSFILTER_SIZE().
      
      Fixes: 	2d3e5caf ("net/ipv4: Replace one-element array with flexible-array member")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4167a960
    • Paolo Abeni's avatar
      net: fix GRO skb truesize update · af352460
      Paolo Abeni authored
      commit 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock
      reference") introduces a serious regression at the GRO layer setting
      the wrong truesize for stolen-head skbs.
      
      Restore the correct truesize: SKB_DATA_ALIGN(...) instead of
      SKB_TRUESIZE(...)
      Reported-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Fixes: 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock reference")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af352460
    • David S. Miller's avatar
      Merge branch 'ipa-runtime-pm' · 83945480
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: more work toward runtime PM
      
      The first two patches in this series are basically bug fixes, but in
      practice I don't think we've seen the problems they might cause.
      
      The third patch moves clock and interconnect related error messages
      around a bit, reporting better information and doing so in the
      functions where they are enabled or disabled (rather than those
      functions' callers).
      
      The last three patches move power-related code into "ipa_clock.c",
      as a step toward generalizing the purpose of that source file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83945480
    • Alex Elder's avatar
      net: ipa: move IPA flags field · afb08b7e
      Alex Elder authored
      The ipa->flags field is only ever used in "ipa_clock.c", related to
      suspend/resume activity.
      
      Move the definition of the ipa_flag enumerated type to "ipa_clock.c".
      And move the flags field from the ipa structure and to the ipa_clock
      structure.  Rename the type and its values to include "power" or
      "POWER" in the name.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb08b7e
    • Alex Elder's avatar
      net: ipa: move ipa_suspend_handler() · afe1baa8
      Alex Elder authored
      Move ipa_suspend_handler() into "ipa_clock.c" from "ipa_main.c", to
      group with the reset of the suspend/resume code.  This IPA interrupt
      is triggered if an IPA RX endpoint is suspended but has a packet to
      be delivered.
      
      Introduce ipa_power_setup() and ipa_power_teardown() to add and
      remove the handler for the IPA SUSPEND interrupt at the same place
      as before, while allowing the handler to remain private.
      
      The "power" naming convention will be adopted elsewhere in this
      file as well (soon).
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afe1baa8
    • Alex Elder's avatar
      net: ipa: move IPA power operations to ipa_clock.c · 73ff316d
      Alex Elder authored
      Move ipa_suspend() and ipa_resume(), as well as the definition of
      the ipa_pm_ops structure into "ipa_clock.c".  Make ipa_pm_ops public
      and declare it as extern in "ipa_clock.h".
      
      This is part of centralizing IPA power management functionality into
      "ipa_clock.c" (the file will eventually get a name change).
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73ff316d
    • Alex Elder's avatar
      net: ipa: improve IPA clock error messages · 8ee7c40a
      Alex Elder authored
      Rearrange messages reported when errors occur in the IPA clock code,
      so that the specific interconnect is identified when an error occurs
      enabling or disabling it, or the core clock is indicated when an
      error occurs enabling it.
      
      Have ipa_interconnect_disable() return zero or the negative error
      value returned by the first interconnect that produced an error
      when disabled.  For now, the callers ignore the returned value.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ee7c40a
    • Alex Elder's avatar
      net: ipa: reorder netdev pointer assignments · 10cc73c4
      Alex Elder authored
      Assign the ipa->modem_netdev and endpoint->netdev pointers *before*
      registering the network device.  As soon as the device is
      registered it can be opened, and by that time we'll want those
      pointers valid.
      
      Similarly, don't make those pointers NULL until *after* the modem
      network device is unregistered in ipa_modem_stop().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10cc73c4
    • Alex Elder's avatar
      net: ipa: don't suspend/resume modem if not up · 30c2515b
      Alex Elder authored
      The modem network device is set up by ipa_modem_start().  But its
      TX queue is not actually started and endpoints enabled until it is
      opened.
      
      So avoid stopping the modem network device TX queue and disabling
      endpoints on suspend or stop unless the netdev is marked UP.  And
      skip attempting to resume unless it is UP.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30c2515b
    • David S. Miller's avatar
      Merge branch 'sja1105-H' · 1f52247e
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      NXP SJA1105 driver support for "H" switch topologies
      
      Changes in v3:
      Preserve the behavior of dsa_tree_setup_default_cpu() which is to pick
      the first CPU port and not the last.
      
      Changes in v2:
      Send as non-RFC, drop the patches for discarding DSA-tagged packets on
      user ports and DSA-untagged packets on DSA and CPU ports for now.
      
      NXP builds boards like the Bluebox 3 where there are multiple SJA1110
      switches connected to an LX2160A, but they are also connected to each
      other. I call this topology an "H" tree because of the lateral
      connection between switches. A piece extracted from a non-upstream
      device tree looks like this:
      
      &spi_bridge {
              /* SW1 */
              ethernet-switch@0 {
                      compatible = "nxp,sja1110a";
                      reg = <0>;
                      dsa,member = <0 0>;
      
                      ethernet-ports {
                              #address-cells = <1>;
                              #size-cells = <0>;
      
                              /* SW1_P1 */
                              port@1 {
                                      reg = <1>;
                                      label = "con_2x20";
                                      phy-mode = "sgmii";
      
                                      fixed-link {
                                              speed = <1000>;
                                              full-duplex;
                                      };
                              };
      
                              port@2 {
                                      reg = <2>;
                                      ethernet = <&dpmac17>;
                                      phy-mode = "rgmii-id";
      
                                      fixed-link {
                                              speed = <1000>;
                                              full-duplex;
                                      };
                              };
      
                              port@3 {
                                      reg = <3>;
                                      label = "1ge_p1";
                                      phy-mode = "rgmii-id";
                                      phy-handle = <&sw1_mii3_phy>;
                              };
      
                              sw1p4: port@4 {
                                      reg = <4>;
                                      link = <&sw2p1>;
                                      phy-mode = "sgmii";
      
                                      fixed-link {
                                              speed = <1000>;
                                              full-duplex;
                                      };
                              };
      
                              port@5 {
                                      reg = <5>;
                                      label = "trx1";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port5_base_t1_phy>;
                              };
      
                              port@6 {
                                      reg = <6>;
                                      label = "trx2";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port6_base_t1_phy>;
                              };
      
                              port@7 {
                                      reg = <7>;
                                      label = "trx3";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port7_base_t1_phy>;
                              };
      
                              port@8 {
                                      reg = <8>;
                                      label = "trx4";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port8_base_t1_phy>;
                              };
      
                              port@9 {
                                      reg = <9>;
                                      label = "trx5";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port9_base_t1_phy>;
                              };
      
                              port@a {
                                      reg = <10>;
                                      label = "trx6";
                                      phy-mode = "internal";
                                      phy-handle = <&sw1_port10_base_t1_phy>;
                              };
                      };
              };
      
              /* SW2 */
              ethernet-switch@2 {
                      compatible = "nxp,sja1110a";
                      reg = <2>;
                      dsa,member = <0 1>;
      
                      ethernet-ports {
                              #address-cells = <1>;
                              #size-cells = <0>;
      
                              sw2p1: port@1 {
                                      reg = <1>;
                                      link = <&sw1p4>;
                                      phy-mode = "sgmii";
      
                                      fixed-link {
                                              speed = <1000>;
                                              full-duplex;
                                      };
                              };
      
                              port@2 {
                                      reg = <2>;
                                      ethernet = <&dpmac18>;
                                      phy-mode = "rgmii-id";
      
                                      fixed-link {
                                              speed = <1000>;
                                              full-duplex;
                                      };
                              };
      
                              port@3 {
                                      reg = <3>;
                                      label = "1ge_p2";
                                      phy-mode = "rgmii-id";
                                      phy-handle = <&sw2_mii3_phy>;
                              };
      
                              port@4 {
                                      reg = <4>;
                                      label = "to_sw3";
                                      phy-mode = "2500base-x";
      
                                      fixed-link {
                                              speed = <2500>;
                                              full-duplex;
                                      };
                              };
      
                              port@5 {
                                      reg = <5>;
                                      label = "trx7";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port5_base_t1_phy>;
                              };
      
                              port@6 {
                                      reg = <6>;
                                      label = "trx8";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port6_base_t1_phy>;
                              };
      
                              port@7 {
                                      reg = <7>;
                                      label = "trx9";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port7_base_t1_phy>;
                              };
      
                              port@8 {
                                      reg = <8>;
                                      label = "trx10";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port8_base_t1_phy>;
                              };
      
                              port@9 {
                                      reg = <9>;
                                      label = "trx11";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port9_base_t1_phy>;
                              };
      
                              port@a {
                                      reg = <10>;
                                      label = "trx12";
                                      phy-mode = "internal";
                                      phy-handle = <&sw2_port10_base_t1_phy>;
                              };
                      };
              };
      };
      
      Basically it is a single DSA tree with 2 "ethernet" properties, i.e. a
      multi-CPU-port system. There is also a DSA link between the switches,
      but it is not a daisy chain topology, i.e. there is no "upstream" and
      "downstream" switch, the DSA link is only to be used for the bridge data
      plane (autonomous forwarding between switches, between the RJ-45 ports
      and the automotive Ethernet ports), otherwise all traffic that should
      reach the host should do so through the dedicated CPU port of the switch.
      
      Of course, plain forwarding in this topology is bound to create packet
      loops. I have thought long and hard about strategies to cut forwarding
      in such a way as to prevent loops but also not impede normal operation
      of the network on such a system, and I believe I have found a solution
      that does work as expected. This relies heavily on DSA's recent ability
      to perform RX filtering towards the host by installing MAC addresses as
      static FDB entries. Since we have 2 distinct DSA masters, we have 2
      distinct MAC addresses, and if the bridge is configured to have its own
      MAC address that makes it 3 distinct MAC addresses. The bridge core,
      plus the switchdev_handle_fdb_add_to_device() extension, handle each MAC
      address by replicating it to each port of the DSA switch tree. So the
      end result is that both switch 1 and switch 2 will have static FDB
      entries towards their respective CPU ports for the 3 MAC addresses
      corresponding to the DSA masters and to the bridge net device (and of
      course, towards any station learned on a foreign interface).
      
      So I think the basic design works, and it is basically just as fragile
      as any other multi-CPU-port system is bound to be in terms of reliance
      on static FDB entries towards the host (if hardware address learning on
      the CPU port is to be used, MAC addresses would randomly bounce between
      one CPU port and the other otherwise). In fact, I think it is even
      better to start DSA's support of multi-CPU-port systems with something
      small like the NXP Bluebox 3, because we allow some time for the code
      paths like dsa_switch_host_address_match(), which were specifically
      designed for it, to break in, and this board needs no user space
      configuration of CPU ports, like static assignments between user and CPU
      ports, or bonding between the CPU ports/DSA masters.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f52247e
    • Vladimir Oltean's avatar
      net: dsa: sja1105: enable address learning on cascade ports · 81d45898
      Vladimir Oltean authored
      Right now, address learning is disabled on DSA ports, which means that a
      packet received over a DSA port from a cross-chip switch will be flooded
      to unrelated ports.
      
      It is desirable to eliminate that, but for that we need a breakdown of
      the possibilities for the sja1105 driver. A DSA port can be:
      
      - a downstream-facing cascade port. This is simple because it will
        always receive packets from a downstream switch, and there should be
        no other route to reach that downstream switch in the first place,
        which means it should be safe to learn that MAC address towards that
        switch.
      
      - an upstream-facing cascade port. This receives packets either:
        * autonomously forwarded by an upstream switch (and therefore these
          packets belong to the data plane of a bridge, so address learning
          should be ok), or
        * injected from the CPU. This deserves further discussion, as normally,
          an upstream-facing cascade port is no different than the CPU port
          itself. But with "H" topologies (a DSA link towards a switch that
          has its own CPU port), these are more "laterally-facing" cascade
          ports than they are "upstream-facing". Here, there is a risk that
          the port might learn the host addresses on the wrong port (on the
          DSA port instead of on its own CPU port), but this is solved by
          DSA's RX filtering infrastructure, which installs the host addresses
          as static FDB entries on the CPU port of all switches in a "H" tree.
          So even if there will be an attempt from the switch to migrate the
          FDB entry from the CPU port to the laterally-facing cascade port, it
          will fail to do that, because the FDB entry that already exists is
          static and cannot migrate. So address learning should be safe for
          this configuration too.
      
      Ok, so what about other MAC addresses coming from the host, not
      necessarily the bridge local FDB entries? What about MAC addresses
      dynamically learned on foreign interfaces, isn't there a risk that
      cascade ports will learn these entries dynamically when they are
      supposed to be delivered towards the CPU port? Well, that is correct,
      and this is why we also need to enable the assisted learning feature, to
      snoop for these addresses and write them to hardware as static FDB
      entries towards the CPU, to make the switch's learning process on the
      cascade ports ineffective for them. With assisted learning enabled, the
      hardware learning on the CPU port must be disabled.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d45898
    • Vladimir Oltean's avatar
      net: dsa: sja1105: suppress TX packets from looping back in "H" topologies · 0f9b762c
      Vladimir Oltean authored
      H topologies like this one have a problem:
      
               eth0                                                     eth1
                |                                                        |
             CPU port                                                CPU port
                |                        DSA link                        |
       sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
         |             |      |                            |      |             |
       user          user   user                         user   user          user
       port          port   port                         port   port          port
      
      Basically any packet sent by the eth0 DSA master can be flooded on the
      interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
      eth1 DSA master too. Basically we are talking to ourselves.
      
      In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
      VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
      Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
      which _can_ be decoded by the tagger running on eth1, so it will attempt
      to reinject that packet into the network stack (the bridge, if there is
      any port under eth1 that is under a bridge). In the case where the ports
      under eth1 are under the same cross-chip bridge as the ports under eth0,
      the TX packets will even be learned as RX packets. The only thing that
      will prevent loops with the software bridging path, and therefore
      disaster, is that the source port and the destination port are in the
      same hardware domain, and the bridge will receive packets from the
      driver with skb->offload_fwd_mark = true and will not forward between
      the two.
      
      The proper solution to this problem is to detect H topologies and
      enforce that all packets are received through the local switch and we do
      not attempt to receive packets on our CPU port from switches that have
      their own. This is a viable solution which works thanks to the fact that
      MAC addresses which should be filtered towards the host are installed by
      DSA as static MAC addresses towards the CPU port of each switch.
      
      TX from a CPU port towards the DSA port continues to be allowed, this is
      because sja1105 supports bridge TX forwarding offload, and the skb->dev
      used initially for xmit does not have any direct correlation with where
      the station that will respond to that packet is connected. It may very
      well happen that when we send a ping through a br0 interface that spans
      all switch ports, the xmit packet will exit the system through a DSA
      switch interface under eth1 (say sw1p2), but the destination station is
      connected to a switch port under eth0, like sw0p0. So the switch under
      eth1 needs to communicate on TX with the switch under eth0. The
      response, however, will not follow the same path, but instead, this
      patch enforces that the response is sent by the first switch directly to
      its DSA master which is eth0.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f9b762c
    • Vladimir Oltean's avatar
      net: dsa: sja1105: increase MTU to account for VLAN header on DSA ports · 777e55e3
      Vladimir Oltean authored
      Since all packets are transmitted as VLAN-tagged over a DSA link (this
      VLAN tag represents the tag_8021q header), we need to increase the MTU
      of these interfaces to account for the possibility that we are already
      transporting a user-visible VLAN header.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      777e55e3
    • Vladimir Oltean's avatar
      net: dsa: sja1105: manage VLANs on cascade ports · c5130029
      Vladimir Oltean authored
      Since commit ed040abc ("net: dsa: sja1105: use 4095 as the private
      VLAN for untagged traffic"), this driver uses a reserved value as pvid
      for the host port (DSA CPU port). Control packets which are sent as
      untagged get classified to this VLAN, and all ports are members of it
      (this is to be expected for control packets).
      
      Manage all cascade ports in the same way and allow control packets to
      egress everywhere.
      
      Also, all VLANs need to be sent as egress-tagged on all cascade ports.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5130029
    • Vladimir Oltean's avatar
      net: dsa: sja1105: manage the forwarding domain towards DSA ports · 3fa21270
      Vladimir Oltean authored
      Manage DSA links towards other switches, be they host ports or cascade
      ports, the same as the CPU port, i.e. allow forwarding and flooding
      unconditionally from all user ports.
      
      We send packets as always VLAN-tagged on a DSA port, and we rely on the
      cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
      port only on the proper remote ports of another switch (the ports that
      are in the same bridging domain). So if there is no cross-chip bridging
      in the system, the flooded packets will be sent on the DSA ports too,
      but they will be dropped by the remote switches due to either
      (a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
      (b) a lack of valid destinations for those packets, due to a lack of the
          RX VLAN on the user ports of the switch
      
      Note that switches which only transport packets in a cross-chip bridge,
      but have no user ports of their own as part of that bridge, such as
      switch 1 in this case:
      
                          DSA link                   DSA link
        sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3
      
      ip link set sw0p0 master br0
      ip link set sw2p3 master br0
      
      will still work, because the tag_8021q cross-chip notifiers keep the RX
      VLANs installed on all DSA ports.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa21270
    • Vladimir Oltean's avatar
      net: dsa: sja1105: configure the cascade ports based on topology · 30a100e6
      Vladimir Oltean authored
      The sja1105 switch family has a feature called "cascade ports" which can
      be used in topologies where multiple SJA1105/SJA1110 switches are daisy
      chained. Upstream switches set this bit for the DSA link towards the
      downstream switches. This is used when the upstream switch receives a
      control packet (PTP, STP) from a downstream switch, because if the
      source port for a control packet is marked as a cascade port, then the
      source port, switch ID and RX timestamp will not be taken again on the
      upstream switch, it is assumed that this has already been done by the
      downstream switch (the leaf port in the tree) and that the CPU has
      everything it needs to decode the information from this packet.
      
      We need to distinguish between an upstream-facing DSA link and a
      downstream-facing DSA link, because the upstream-facing DSA links are
      "host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
      DSA links are "cascade ports".
      
      Note that SJA1105 supports a single cascade port, so only daisy chain
      topologies work. With SJA1110, there can be more complex topologies such
      as:
      
                          eth0
                           |
                       host port
                           |
       sw0p0    sw0p1    sw0p2    sw0p3    sw0p4
         |        |                 |        |
       cascade  cascade            user     user
        port     port              port     port
         |        |
         |        |
         |        |
         |       host
         |       port
         |        |
         |      sw1p0    sw1p1    sw1p2    sw1p3    sw1p4
         |                 |        |        |        |
         |                user     user     user     user
        host              port     port     port     port
        port
         |
       sw2p0    sw2p1    sw2p2    sw2p3    sw2p4
                  |        |        |        |
                 user     user     user     user
                 port     port     port     port
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30a100e6
    • Vladimir Oltean's avatar
      net: dsa: give preference to local CPU ports · 2c0b0325
      Vladimir Oltean authored
      Be there an "H" switch topology, where there are 2 switches connected as
      follows:
      
               eth0                                                     eth1
                |                                                        |
             CPU port                                                CPU port
                |                        DSA link                        |
       sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
         |             |      |                            |      |             |
       user          user   user                         user   user          user
       port          port   port                         port   port          port
      
      basically one where each switch has its own CPU port for termination,
      but there is also a DSA link in case packets need to be forwarded in
      hardware between one switch and another.
      
      DSA insists to see this as a daisy chain topology, basically registering
      all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding
      eth1 as a valid DSA master.
      
      This is only half the story, since when asked using dsa_port_is_cpu(),
      DSA will respond that sw1p1 is a CPU port, however one which has no
      dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used.
      
      Furthermore, be there a driver for switches which support only one
      upstream port. This driver iterates through its ports and checks using
      dsa_is_upstream_port() whether the current port is an upstream one.
      For switch 1, two ports pass the "is upstream port" checks:
      
      - sw1p4 is an upstream port because it is a routing port towards the
        dedicated CPU port assigned using dsa_tree_setup_default_cpu()
      
      - sw1p1 is also an upstream port because it is a CPU port, albeit one
        that is disabled. This is because dsa_upstream_port() returns:
      
      	if (!cpu_dp)
      		return port;
      
        which means that if @dp does not have a ->cpu_dp pointer (which is a
        characteristic of CPU ports themselves as well as unused ports), then
        @dp is its own upstream port.
      
      So the driver for switch 1 rightfully says: I have two upstream ports,
      but I don't support multiple upstream ports! So let me error out, I
      don't know which one to choose and what to do with the other one.
      
      Generally I am against enforcing any default policy in the kernel in
      terms of user to CPU port assignment (like round robin or such) but this
      case is different. To solve the conundrum, one would have to:
      
      - Disable sw1p1 in the device tree or mark it as "not a CPU port" in
        order to comply with DSA's view of this topology as a daisy chain,
        where the termination traffic from switch 1 must pass through switch 0.
        This is counter-productive because it wastes 1Gbps of termination
        throughput in switch 1.
      - Disable the DSA link between sw0p4 and sw1p4 and do software
        forwarding between switch 0 and 1, and basically treat the switches as
        part of disjoint switch trees. This is counter-productive because it
        wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1.
      - Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could
        work, but it makes cross-chip bridging impossible. In this setup we
        would need to have 2 separate bridges, br0 spanning the ports of
        switch 0, and br1 spanning the ports of switch 1, and the "DSA links
        treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are
        the gateway ports between one bridge and another. This is hard to
        manage from a user's perspective, who wants to have a unified view of
        the switching fabric and the ability to transparently add ports to the
        same bridge. VLANs would also need to be explicitly managed by the
        user on these gateway ports.
      
      So it seems that the only reasonable thing to do is to make DSA prefer
      CPU ports that are local to the switch. Meaning that by default, the
      user and DSA ports of switch 0 will get assigned to the CPU port from
      switch 0 (sw0p1) and the user and DSA ports of switch 1 will get
      assigned to the CPU port from switch 1.
      
      The way this solves the problem is that sw1p4 is no longer an upstream
      port as far as switch 1 is concerned (it no longer views sw0p1 as its
      dedicated CPU port).
      
      So here we are, the first multi-CPU port that DSA supports is also
      perhaps the most uneventful one: the individual switches don't support
      multiple CPUs, however the DSA switch tree as a whole does have multiple
      CPU ports. No user space assignment of user ports to CPU ports is
      desirable, necessary, or possible.
      
      Ports that do not have a local CPU port (say there was an extra switch
      hanging off of sw0p0) default to the standard implementation of getting
      assigned to the first CPU port of the DSA switch tree. Is that good
      enough? Probably not (if the downstream switch was hanging off of switch
      1, we would most certainly prefer its CPU port to be sw1p1), but in
      order to support that use case too, we would need to traverse the
      dst->rtable in search of an optimum dedicated CPU port, one that has the
      smallest number of hops between dp->ds and dp->cpu_dp->ds. At the
      moment, the DSA routing table structure does not keep the number of hops
      between dl->dp and dl->link_dp, and while it is probably deducible,
      there is zero justification to write that code now. Let's hope DSA will
      never have to support that use case.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c0b0325
    • Vladimir Oltean's avatar
      net: dsa: rename teardown_default_cpu to teardown_cpu_ports · 0e8eb9a1
      Vladimir Oltean authored
      There is nothing specific to having a default CPU port to what
      dsa_tree_teardown_default_cpu() does. Even with multiple CPU ports,
      it would do the same thing: iterate through the ports of this switch
      tree and reset the ->cpu_dp pointer to NULL. So rename it accordingly.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e8eb9a1
    • Alex Elder's avatar
      net: ipa: fix IPA v4.9 interconnects · 0fd75f57
      Alex Elder authored
      Three interconnects are defined for IPA version 4.9, but there
      should only be two.  They should also use names that match what's
      used for other platforms (and specified in the Device Tree binding).
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd75f57
    • Colin Ian King's avatar
      mctp: remove duplicated assignment of pointer hdr · df7ba0eb
      Colin Ian King authored
      The pointer hdr is being initialized and also re-assigned with the
      same value from the call to function mctp_hdr. Static analysis reports
      that the initializated value is unused. The second assignment is
      duplicated and can be removed.
      
      Addresses-Coverity: ("Unused value").
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df7ba0eb
  2. 04 Aug, 2021 15 commits
    • Sebastian Andrzej Siewior's avatar
      net: Replace deprecated CPU-hotplug functions. · 372bbdd5
      Sebastian Andrzej Siewior authored
      The functions get_online_cpus() and put_online_cpus() have been
      deprecated during the CPU hotplug rework. They map directly to
      cpus_read_lock() and cpus_read_unlock().
      
      Replace deprecated CPU-hotplug functions with the official version.
      The behavior remains unchanged.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      372bbdd5
    • Sebastian Andrzej Siewior's avatar
      virtio_net: Replace deprecated CPU-hotplug functions. · a0d1d0f4
      Sebastian Andrzej Siewior authored
      The functions get_online_cpus() and put_online_cpus() have been
      deprecated during the CPU hotplug rework. They map directly to
      cpus_read_lock() and cpus_read_unlock().
      
      Replace deprecated CPU-hotplug functions with the official version.
      The behavior remains unchanged.
      
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a0d1d0f4
    • Nick Richardson's avatar
      pktgen: Remove redundant clone_skb override · c2eecaa1
      Nick Richardson authored
      When the netif_receive xmit_mode is set, a line is supposed to set
      clone_skb to a default 0 value. This line is made redundant due to a
      preceding line that checks if clone_skb is more than zero and returns
      -ENOTSUPP.
      
      Overriding clone_skb to 0 does not make any difference to the behavior
      because if it was positive we return error. So it can be either 0 or
      negative, and in both cases the behavior is the same.
      
      Remove redundant line that sets clone_skb to zero.
      Signed-off-by: default avatarNick Richardson <richardsonnick@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2eecaa1
    • Jonathan Lemon's avatar
      ptp: ocp: Expose various resources on the timecard. · 773bda96
      Jonathan Lemon authored
      The OpenCompute timecard driver has additional functionality besides
      a clock.  Make the following resources available:
      
       - The external timestamp channels (ts0/ts1)
       - devlink support for flashing and health reporting
       - GPS and MAC serial ports
       - board serial number (obtained from i2c device)
      
      Also add watchdog functionality for when GNSS goes into holdover.
      
      The resources are collected under a timecard class directory:
      
        [jlemon@timecard ~]$ ls -g /sys/class/timecard/ocp1/
        total 0
        -r--r--r--. 1 root 4096 Aug  3 19:49 available_clock_sources
        -rw-r--r--. 1 root 4096 Aug  3 19:49 clock_source
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 device -> ../../../0000:04:00.0/
        -r--r--r--. 1 root 4096 Aug  3 19:49 gps_sync
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 i2c -> ../../xiic-i2c.1024/i2c-2/
        drwxr-xr-x. 2 root    0 Aug  3 19:49 power/
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 pps ->
        ../../../../../virtual/pps/pps1/
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 ptp -> ../../ptp/ptp2/
        -r--r--r--. 1 root 4096 Aug  3 19:49 serialnum
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 subsystem ->
        ../../../../../../class/timecard/
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 ttyGPS -> ../../tty/ttyS7/
        lrwxrwxrwx. 1 root    0 Aug  3 19:49 ttyMAC -> ../../tty/ttyS8/
        -rw-r--r--. 1 root 4096 Aug  3 19:39 uevent
      
      The labeling is needed at the minimum, in order to tell the serial
      devices apart.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      773bda96
    • Pavel Tikhomirov's avatar
      sock: allow reading and changing sk_userlocks with setsockopt · 04190bf8
      Pavel Tikhomirov authored
      SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
      buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
      tcp_sndbuf_expand()). If we've just created a new socket this adjustment
      is enabled on it, but if one changes the socket buffer size by
      setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
      
      CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
      restore as it first needs to increase buffer sizes for packet queues
      restore and second it needs to restore back original buffer sizes. So
      after CRIU restore all sockets become non-auto-adjustable, which can
      decrease network performance of restored applications significantly.
      
      CRIU need to be able to restore sockets with enabled/disabled adjustment
      to the same state it was before dump, so let's add special setsockopt
      for it.
      
      Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
      that using these interface one can reenable automatic socket buffer
      adjustment on their sockets.
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04190bf8
    • Peilin Ye's avatar
      tc-testing: Add control-plane selftests for sch_mq · 625af9f0
      Peilin Ye authored
      Recently we added multi-queue support to netdevsim in commit d4861fc6
      ("netdevsim: Add multi-queue support"); add a few control-plane selftests
      for sch_mq using this new feature.
      
      Use nsPlugin.py to avoid network interface name collisions.
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      625af9f0
    • Vladimir Oltean's avatar
      Revert "net: build all switchdev drivers as modules when the bridge is a module" · a54182b2
      Vladimir Oltean authored
      This reverts commit b0e81817. Explicit
      driver dependency on the bridge is no longer needed since
      switchdev_bridge_port_{,un}offload() is no longer implemented by the
      bridge driver but by switchdev.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a54182b2
    • Vladimir Oltean's avatar
      net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge · 957e2235
      Vladimir Oltean authored
      With the introduction of explicit offloading API in switchdev in commit
      2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge
      ports are offloaded"), we started having Ethernet switch drivers calling
      directly into a function exported by net/bridge/br_switchdev.c, which is
      a function exported by the bridge driver.
      
      This means that drivers that did not have an explicit dependency on the
      bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
      possible to call a symbol exported by a driver that can be built as
      module unless you are a module too.
      
      There was an attempt to solve the dependency issue in the form of commit
      b0e81817 ("net: build all switchdev drivers as modules when the
      bridge is a module"). Grygorii Strashko, however, says about it:
      
      | In my opinion, the problem is a bit bigger here than just fixing the
      | build :(
      |
      | In case, of ^cpsw the switchdev mode is kinda optional and in many
      | cases (especially for testing purposes, NFS) the multi-mac mode is
      | still preferable mode.
      |
      | There were no such tight dependency between switchdev drivers and
      | bridge core before and switchdev serviced as independent, notification
      | based layer between them, so ^cpsw still can be "Y" and bridge can be
      | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
      | will need to be set as "Y", or we will have to update drivers to
      | support build with BRIDGE=n and maintain separate builds for
      | networking vs non-networking testing.  But is this enough?  Wouldn't
      | it cause 'chain reaction' required to add more and more "Y" options
      | (like CONFIG_VLAN_8021Q)?
      |
      | PS. Just to be sure we on the same page - ARM builds will be forced
      | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
      | automation testing will just fail with omap2plus_defconfig.
      
      In the light of this, it would be desirable for some configurations to
      avoid dependencies between switchdev drivers and the bridge, and have
      the switchdev mode as completely optional within the driver.
      
      Arnd Bergmann also tried to write a patch which better expressed the
      build time dependency for Ethernet switch drivers where the switchdev
      support is optional, like cpsw/am65-cpsw, and this made the drivers
      follow the bridge (compile as module if the bridge is a module) only if
      the optional switchdev support in the driver was enabled in the first
      place:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
      
      but this still did not solve the fact that cpsw and am65-cpsw now must
      be built as modules when the bridge is a module - it just expressed
      correctly that optional dependency. But the new behavior is an apparent
      regression from Grygorii's perspective.
      
      So to support the use case where the Ethernet driver is built-in,
      NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
      need a framework that can handle the possible absence of the bridge from
      the running system, i.e. runtime bloatware as opposed to build-time
      bloatware.
      
      Luckily we already have this framework, since switchdev has been using
      it extensively. Events from the bridge side are transmitted to the
      driver side using notifier chains - this was originally done so that
      unrelated drivers could snoop for events emitted by the bridge towards
      ports that are implemented by other drivers (think of a switch driver
      with LAG offload that listens for switchdev events on a bonding/team
      interface that it offloads).
      
      There are also events which are transmitted from the driver side to the
      bridge side, which again are modeled using notifiers.
      SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
      notifying the bridge that a MAC address has been dynamically learned.
      So there is a precedent we can use for modeling the new framework.
      
      The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
      that the bridge needs to do when a port becomes offloaded is blocking in
      its nature: replay VLANs, MDBs etc. The calling context is indeed
      blocking (we are under rtnl_mutex), but the existing switchdev
      notification chain that the bridge is subscribed to is only the atomic
      one. So we need to subscribe the bridge to the blocking switchdev
      notification chain too.
      
      This patch:
      - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
        unchanged
      - moves the implementation of switchdev_bridge_port_{,un}offload from
        the bridge module into the switchdev module.
      - makes everybody that is subscribed to the switchdev blocking notifier
        chain "hear" offload & unoffload events
      - makes the bridge driver subscribe and handle those events
      - moves the bridge driver's handling of those events into 2 new
        functions called br_switchdev_port_{,un}offload. These functions
        contain in fact the core of the logic that was previously in
        switchdev_bridge_port_{,un}offload, just that now we go through an
        extra indirection layer to reach them.
      
      Unlike all the other switchdev notification structures, the structure
      used to carry the bridge port information, struct
      switchdev_notifier_brport_info, does not contain a "bool handled".
      This is because in the current usage pattern, we always know that a
      switchdev bridge port offloading event will be handled by the bridge,
      because the switchdev_bridge_port_offload() call was initiated by a
      NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
      bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
      couldn't have happened.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      957e2235
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-5.15-20210804' of... · 9c0532f9
      David S. Miller authored
      Merge tag 'linux-can-next-for-5.15-20210804' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2021-08-04
      
      this is a pull request of 5 patches for net-next/master.
      
      The first patch is by me and fixes a typo in a comment in the CAN
      J1939 protocol.
      
      The next 2 patches are by Oleksij Rempel and update the CAN J1939
      protocol to send RX status updates via the error queue mechanism.
      
      The next patch is by me and adds a missing variable initialization to
      the flexcan driver (the problem was introduced in the current net-next
      cycle).
      
      The last patch is by Aswath Govindraju and adds power-domains to the
      Bosch m_can DT binding documentation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c0532f9
    • Aswath Govindraju's avatar
      dt-bindings: net: can: Document power-domains property · d85165b2
      Aswath Govindraju authored
      Document power-domains property for adding the Power domain provider.
      
      Link: https://lore.kernel.org/r/20210802091822.16407-1-a-govindraju@ti.comSigned-off-by: default avatarAswath Govindraju <a-govindraju@ti.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d85165b2
    • Marc Kleine-Budde's avatar
      can: flexcan: flexcan_clks_enable(): add missing variable initialization · 33626669
      Marc Kleine-Budde authored
      This patch adds the missing initialization of the "err" variable in
      the flexcan_clks_enable() function.
      
      Fixes: d9cead75 ("can: flexcan: add mcf5441x support")
      Link: https://lore.kernel.org/r/20210728075428.1493568-1-mkl@pengutronix.deReported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Angelo Dureghello <angelo@kernel-space.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      33626669
    • Oleksij Rempel's avatar
      can: j1939: extend UAPI to notify about RX status · 5b9272e9
      Oleksij Rempel authored
      To be able to create applications with user friendly feedback, we need be
      able to provide receive status information.
      
      Typical ETP transfer may take seconds or even hours. To give user some
      clue or show a progress bar, the stack should push status updates.
      Same as for the TX information, the socket error queue will be used with
      following new signals:
      - J1939_EE_INFO_RX_RTS   - received and accepted request to send signal.
      - J1939_EE_INFO_RX_DPO   - received data package offset signal
      - J1939_EE_INFO_RX_ABORT - RX session was aborted
      
      Instead of completion signal, user will get data package.
      To activate this signals, application should set
      SOF_TIMESTAMPING_RX_SOFTWARE to the SO_TIMESTAMPING socket option. This
      will avoid unpredictable application behavior for the old software.
      
      Link: https://lore.kernel.org/r/20210707094854.30781-3-o.rempel@pengutronix.deSigned-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      5b9272e9
    • Oleksij Rempel's avatar
    • Eric Dumazet's avatar
      ipv6: exthdrs: get rid of indirect calls in ip6_parse_tlv() · 51b8f812
      Eric Dumazet authored
      As presented last month in our "BIG TCP" talk at netdev 0x15,
      we plan using IPv6 jumbograms.
      
      One of the minor problem we talked about is the fact that
      ip6_parse_tlv() is currently using tables to list known tlvs,
      thus using potentially expensive indirect calls.
      
      While we could mitigate this cost using macros from
      indirect_call_wrapper.h, we also can get rid of the tables
      and let the compiler emit optimized code.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Justin Iurman <justin.iurman@uliege.be>
      Cc: Coco Li <lixiaoyan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51b8f812
    • David S. Miller's avatar
      Merge branch 'm7530-sw-fallback' · d8517985
      David S. Miller authored
      DENG Qingfang says:
      
      ====================
      mt7530 software fallback bridging fix
      
      DSA core has gained software fallback support since commit 2f5dc00f,
      but it does not work properly on mt7530. This patch series fixes the
      issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8517985