1. 02 Apr, 2023 3 commits
    • Ido Schimmel's avatar
      mlxsw: core_thermal: Use static trip points for transceiver modules · 5601ef91
      Ido Schimmel authored
      The driver registers a thermal zone for each transceiver module and
      tries to set the trip point temperatures according to the thresholds
      read from the transceiver. If a threshold cannot be read or if a
      transceiver is unplugged, the trip point temperature is set to zero,
      which means that it is disabled as far as the thermal subsystem is
      concerned.
      
      A recent change in the thermal core made it so that such trip points are
      no longer marked as disabled, which lead the thermal subsystem to
      incorrectly set the associated cooling devices to the their maximum
      state [1]. A fix to restore this behavior was merged in commit
      f1b80a38 ("thermal: core: Restore behavior regarding invalid trip
      points"). However, the thermal maintainer suggested to not rely on this
      behavior and instead always register a valid array of trip points [2].
      
      Therefore, create a static array of trip points with sane defaults
      (suggested by Vadim) and register it with the thermal zone of each
      transceiver module. User space can choose to override these defaults
      using the thermal zone sysfs interface since these files are writeable.
      
      Before:
      
       $ cat /sys/class/thermal/thermal_zone11/type
       mlxsw-module11
       $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
       65000
       75000
       80000
      
      After:
      
       $ cat /sys/class/thermal/thermal_zone11/type
       mlxsw-module11
       $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
       55000
       65000
       80000
      
      Also tested by reverting commit f1b80a38 ("thermal: core: Restore
      behavior regarding invalid trip points") and making sure that the
      associated cooling devices are not set to their maximum state.
      
      [1] https://lore.kernel.org/linux-pm/ZA3CFNhU4AbtsP4G@shredder/
      [2] https://lore.kernel.org/linux-pm/f78e6b70-a963-c0ca-a4b2-0d4c6aeef1fb@linaro.org/Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5601ef91
    • Jakub Kicinski's avatar
      net: minor reshuffle of napi_struct · dd2d6604
      Jakub Kicinski authored
      napi_id is read by GRO and drivers to mark skbs, and it currently
      sits at the end of the structure, in a mostly unused cache line.
      Move it up into a hole, and separate the clearly control path
      fields from the important ones.
      
      Before:
      
      struct napi_struct {
      	struct list_head           poll_list;            /*     0    16 */
      	long unsigned int          state;                /*    16     8 */
      	int                        weight;               /*    24     4 */
      	int                        defer_hard_irqs_count; /*    28     4 */
      	long unsigned int          gro_bitmask;          /*    32     8 */
      	int                        (*poll)(struct napi_struct *, int); /*    40     8 */
      	int                        poll_owner;           /*    48     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct net_device *        dev;                  /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct gro_list            gro_hash[8];          /*    64   192 */
      	/* --- cacheline 4 boundary (256 bytes) --- */
      	struct sk_buff *           skb;                  /*   256     8 */
      	struct list_head           rx_list;              /*   264    16 */
      	int                        rx_count;             /*   280     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct hrtimer             timer;                /*   288    64 */
      
      	/* XXX last struct has 4 bytes of padding */
      
      	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
      	struct list_head           dev_list;             /*   352    16 */
      	struct hlist_node          napi_hash_node;       /*   368    16 */
      	/* --- cacheline 6 boundary (384 bytes) --- */
      	unsigned int               napi_id;              /*   384     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct task_struct *       thread;               /*   392     8 */
      
      	/* size: 400, cachelines: 7, members: 17 */
      	/* sum members: 388, holes: 3, sum holes: 12 */
      	/* paddings: 1, sum paddings: 4 */
      	/* last cacheline: 16 bytes */
      };
      
      After:
      
      struct napi_struct {
      	struct list_head           poll_list;            /*     0    16 */
      	long unsigned int          state;                /*    16     8 */
      	int                        weight;               /*    24     4 */
      	int                        defer_hard_irqs_count; /*    28     4 */
      	long unsigned int          gro_bitmask;          /*    32     8 */
      	int                        (*poll)(struct napi_struct *, int); /*    40     8 */
      	int                        poll_owner;           /*    48     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct net_device *        dev;                  /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct gro_list            gro_hash[8];          /*    64   192 */
      	/* --- cacheline 4 boundary (256 bytes) --- */
      	struct sk_buff *           skb;                  /*   256     8 */
      	struct list_head           rx_list;              /*   264    16 */
      	int                        rx_count;             /*   280     4 */
      	unsigned int               napi_id;              /*   284     4 */
      	struct hrtimer             timer;                /*   288    64 */
      
      	/* XXX last struct has 4 bytes of padding */
      
      	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
      	struct task_struct *       thread;               /*   352     8 */
      	struct list_head           dev_list;             /*   360    16 */
      	struct hlist_node          napi_hash_node;       /*   376    16 */
      
      	/* size: 392, cachelines: 7, members: 17 */
      	/* sum members: 388, holes: 1, sum holes: 4 */
      	/* paddings: 1, sum paddings: 4 */
      	/* forced alignments: 1 */
      	/* last cacheline: 8 bytes */
      } __attribute__((__aligned__(8)));
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd2d6604
    • Sylwester Dziedziuch's avatar
      i40e: Add support for VF to specify its primary MAC address · ceb29474
      Sylwester Dziedziuch authored
      Currently in the i40e driver there is no implementation of different
      MAC address handling depending on whether it is a legacy or primary.
      Introduce new checks for VF to be able to specify its primary MAC
      address based on the VIRTCHNL_ETHER_ADDR_PRIMARY type.
      
      Primary MAC address are treated differently compared to legacy
      ones in a scenario where:
      1. If a unicast MAC is being added and it's specified as
      VIRTCHNL_ETHER_ADDR_PRIMARY, then replace the current
      default_lan_addr.addr.
      2. If a unicast MAC is being deleted and it's type
      is specified as VIRTCHNL_ETHER_ADDR_PRIMARY, then zero the
      hw_lan_addr.addr.
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb29474
  2. 01 Apr, 2023 1 commit
  3. 31 Mar, 2023 20 commits
    • Jakub Kicinski's avatar
      Merge tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 54fd494a
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      netfilter updates for net-next
      
      1. No need to disable BH in nfnetlink proc handler, freeing happens
         via call_rcu.
      2. Expose classid in nfetlink_queue, from Eric Sage.
      3. Fix nfnetlink message description comments, from Matthieu De Beule.
      4. Allow removal of offloaded connections via ctnetlink, from Paul Blakey.
      
      * tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
        netfilter: ctnetlink: Support offloaded conntrack entry deletion
        netfilter: Correct documentation errors in nf_tables.h
        netfilter: nfnetlink_queue: enable classid socket info retrieval
        netfilter: nfnetlink_log: remove rcu_bh usage
      ====================
      
      Link: https://lore.kernel.org/r/20230331104809.2959-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      54fd494a
    • Peng Fan's avatar
    • Kuniyuki Iwashima's avatar
      tcp: Refine SYN handling for PAWS. · ee05d90d
      Kuniyuki Iwashima authored
      Our Network Load Balancer (NLB) [0] has multiple nodes with different
      IP addresses, and each node forwards TCP flows from clients to backend
      targets.  NLB has an option to preserve the client's source IP address
      and port when routing packets to backend targets. [1]
      
      When a client connects to two different NLB nodes, they may select the
      same backend target.  Then, if the client has used the same source IP
      and port, the two flows at the backend side will have the same 4-tuple.
      
      While testing around such cases, I saw these sequences on the backend
      target.
      
      IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2819965599, win 62727, options [mss 8365,sackOK,TS val 1029816180 ecr 0,nop,wscale 7], length 0
      IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3040695044, ack 2819965600, win 62643, options [mss 8961,sackOK,TS val 1224784076 ecr 1029816180,nop,wscale 7], length 0
      IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 1, win 491, options [nop,nop,TS val 1029816181 ecr 1224784076], length 0
      IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2681819307, win 62727, options [mss 8365,sackOK,TS val 572088282 ecr 0,nop,wscale 7], length 0
      IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 1, win 490, options [nop,nop,TS val 1224794914 ecr 1029816181,nop,nop,sack 1 {4156821004:4156821005}], length 0
      
      It seems to be working correctly, but the last ACK was generated by
      tcp_send_dupack() and PAWSEstab was increased.  This is because the
      second connection has a smaller timestamp than the first one.
      
      In this case, we should send a dup ACK in tcp_send_challenge_ack()
      to increase the correct counter and rate-limit it properly.
      
      Let's check the SYN flag after the PAWS tests to avoid adding unnecessary
      overhead for most packets.
      
      Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html [0]
      Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation [1]
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee05d90d
    • Herbert Xu's avatar
      macvlan: Fix mc_filter calculation · ae63ad9b
      Herbert Xu authored
      On Wed, Mar 29, 2023 at 08:10:26AM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
      >
      > Here is the summary with links:
      >   - [1/2] macvlan: Skip broadcast queue if multicast with single receiver
      >     https://git.kernel.org/netdev/net-next/c/d45276e75e90
      >   - [2/2] macvlan: Add netlink attribute for broadcast cutoff
      >     https://git.kernel.org/netdev/net-next/c/954d1fa1ac93
      
      Sorry, I made an error and posted my patches from an earlier
      revision so a follow-up fix was missing:
      
      ---8<---
      The bc_cutoff patch broke the calculation of mc_filter causing
      some multicast packets to not make it through to the targeted
      device.
      
      Fix this by checking whether vlan is set instead of cutoff >= 0.
      
      Also move the cutoff < 0 logic into macvlan_recompute_bc_filter
      so that it doesn't change the mc_filter at all.
      
      Fixes: d45276e7 ("macvlan: Skip broadcast queue if multicast with single receiver")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae63ad9b
    • Jakub Kicinski's avatar
      Merge tag 'wireless-next-2023-03-30' of... · ce7928f7
      Jakub Kicinski authored
      Merge tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
      
      Johannes Berg says:
      
      ====================
      Major stack changes:
      
       * TC offload support for drivers below mac80211
       * reduced neighbor report (RNR) handling for AP mode
       * mac80211 mesh fast-xmit and fast-rx support
       * support for another mesh A-MSDU format
         (seems nobody got the spec right)
      
      Major driver changes:
      
      Kalle moved the drivers that were just plain C files
      in drivers/net/wireless/ to legacy/ and virtual/ dirs.
      
      hwsim
       * multi-BSSID support
       * some FTM support
      
      ath11k
       * MU-MIMO parameters support
       * ack signal support for management packets
      
      rtl8xxxu
       * support for RTL8710BU aka RTL8188GU chips
      
      rtw89
       * support for various newer firmware APIs
      
      ath10k
       * enabled threaded NAPI on WCN3990
      
      iwlwifi
       * lots of work for multi-link/EHT (wifi7)
       * hardware timestamping support for some devices/firwmares
       * TX beacon protection on newer hardware
      
      * tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (181 commits)
        wifi: clean up erroneously introduced file
        wifi: iwlwifi: mvm: correctly use link in iwl_mvm_sta_del()
        wifi: iwlwifi: separate AP link management queues
        wifi: iwlwifi: mvm: free probe_resp_data later
        wifi: iwlwifi: bump FW API to 75 for AX devices
        wifi: iwlwifi: mvm: move max_agg_bufsize into host TLC lq_sta
        wifi: iwlwifi: mvm: send full STA during HW restart
        wifi: iwlwifi: mvm: rework active links counting
        wifi: iwlwifi: mvm: update mac config when assigning chanctx
        wifi: iwlwifi: mvm: use the correct link queue
        wifi: iwlwifi: mvm: clean up mac_id vs. link_id in MLD sta
        wifi: iwlwifi: mvm: fix station link data leak
        wifi: iwlwifi: mvm: initialize max_rc_amsdu_len per-link
        wifi: iwlwifi: mvm: use appropriate link for rate selection
        wifi: iwlwifi: mvm: use the new lockdep-checking macros
        wifi: iwlwifi: mvm: remove chanctx WARN_ON
        wifi: iwlwifi: mvm: avoid sending MAC context for idle
        wifi: iwlwifi: mvm: remove only link-specific AP keys
        wifi: iwlwifi: mvm: skip inactive links
        wifi: iwlwifi: mvm: adjust iwl_mvm_scan_respect_p2p_go_iter() for MLO
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230330205612.921134-1-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ce7928f7
    • Jakub Kicinski's avatar
      Merge branch 'tools-ynl-fill-in-some-gaps-of-ethtool-spec' · dee1efb3
      Jakub Kicinski authored
      Stanislav Fomichev says:
      
      ====================
      tools: ynl: fill in some gaps of ethtool spec
      
      I was trying to fill in the spec while exploring ethtool API for some
      related work. I don't think I'll have the patience to fill in the rest,
      so decided to share whatever I currently have.
      
      Patches 1-2 add the be16 + spec.
      Patches 3-4 implement an ethtool-like python tool to test the spec.
      
      Patches 3-4 are there because it felt more fun do the tool instead
      of writing the actual tests; feel free to drop it; sharing mostly
      to show that the spec is not a complete nonsense.
      
      The spec is not 100% complete, see patch 2 for what's missing.
      I was hoping to finish the stats-get message, but I'm too dump
      to implement bitmask marshaling (multi-attr).
      ====================
      
      Link: https://lore.kernel.org/r/20230329221655.708489-1-sdf@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dee1efb3
    • Stanislav Fomichev's avatar
      tools: ynl: ethtool testing tool · f3d07b02
      Stanislav Fomichev authored
      This is what I've been using to see whether the spec makes sense.
      A small subset of getters (mostly the unprivileged ones) is implemented.
      Some setters (channels) also work.
      Setters for messages with bitmasks are not implemented.
      
      Initially I was trying to make this tool look 1:1 like real ethtool,
      but eventually gave up :-)
      
      Sample output:
      
      $ ./tools/net/ynl/ethtool enp0s31f6
      Settings for enp0s31f6:
      Supported ports: [ TP ]
      Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half
      100baseT/Full 1000baseT/Full
      Supported pause frame use: no
      Supports auto-negotiation: yes
      Supported FEC modes: Not reported
      Speed: Unknown!
      Duplex: Unknown! (255)
      Auto-negotiation: on
      Port: Twisted Pair
      PHYAD: 2
      Transceiver: Internal
      MDI-X: Unknown (auto)
      Current message level: drv probe link
      Link detected: no
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3d07b02
    • Stanislav Fomichev's avatar
      tools: ynl: replace print with NlError · 48993e22
      Stanislav Fomichev authored
      Instead of dumping the error on the stdout, make the callee and
      opportunity to decide what to do with it. This is mostly for the
      ethtool testing.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      48993e22
    • Stanislav Fomichev's avatar
      tools: ynl: populate most of the ethtool spec · a353318e
      Stanislav Fomichev authored
      Things that are not implemented:
      - cable tests
      - bitmaks in the requests don't work (needs multi-attr support in ynl.py)
      - stats-get seems to return nonsense (not passing a bitmask properly?)
      - notifications are not tested
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a353318e
    • Stanislav Fomichev's avatar
      tools: ynl: support byte-order in cli · 9f7cc57f
      Stanislav Fomichev authored
      Used by ethtool spec.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f7cc57f
    • Simon Horman's avatar
      octeontx2-af: update type of prof fields in nix_aw_enq_req · 709d0b88
      Simon Horman authored
      Update type of prof and prof_mask fields in nix_as_enq_req
      from u64 to struct nix_bandprof_s, which is 128 bits wide.
      
      This is to address warnings with compiling with gcc-12 W=1
      regarding string fortification.
      
      Although the union of which these fields are a member is 128bits
      wide, and thus writing a 128bit entity is safe, the compiler flags
      a problem as the field being written is only 64 bits wide.
      
        CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.o
      scripts/Makefile.build:252: ./drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf
        CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.o
        CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.o
        CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.o
        CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.o
      In file included from ./include/linux/string.h:254,
                       from ./include/linux/bitmap.h:11,
                       from ./include/linux/cpumask.h:12,
                       from ./arch/x86/include/asm/paravirt.h:17,
                       from ./arch/x86/include/asm/cpuid.h:62,
                       from ./arch/x86/include/asm/processor.h:19,
                       from ./arch/x86/include/asm/timex.h:5,
                       from ./include/linux/timex.h:67,
                       from ./include/linux/time32.h:13,
                       from ./include/linux/time.h:60,
                       from ./include/linux/stat.h:19,
                       from ./include/linux/module.h:13,
                       from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:8:
      In function 'fortify_memcpy_chk',
          inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:969:4:
      ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
        529 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      In function 'fortify_memcpy_chk',
          inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:984:4:
      ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
        529 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      cc1: all warnings being treated as errors
      
      Compile tested only!
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230329112356.458072-1-horms@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      709d0b88
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-act_tunnel_key-add-support-for-tunnel_dont_fragment' · f76b9bba
      Jakub Kicinski authored
      Davide Caratti says:
      
      ====================
      net/sched: act_tunnel_key: add support for TUNNEL_DONT_FRAGMENT
      
      - patch 1 extends TC tunnel_key action to add support for TUNNEL_DONT_FRAGMENT
      - patch 2 extends tdc to skip tests when iproute2 support is missing
      - patch 3 adds a tdc test case to verify functionality of the control plane
      - patch 4 adds a net/forwarding test case to verify functionality of the data plane
      ====================
      
      Link: https://lore.kernel.org/r/cover.1680082990.git.dcaratti@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f76b9bba
    • Davide Caratti's avatar
      selftests: forwarding: add tunnel_key "nofrag" test case · 533a89b1
      Davide Caratti authored
      Add a selftest that configures metadata tunnel encapsulation using the TC
      "tunnel_key" action: it includes a test case for setting "nofrag" flag.
      
      Example output:
      
       # selftests: net/forwarding: tc_tunnel_key.sh
       # TEST: tunnel_key nofrag (skip_hw)                                   [ OK ]
       # INFO: Could not test offloaded functionality
       ok 1 selftests: net/forwarding: tc_tunnel_key.sh
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      533a89b1
    • Davide Caratti's avatar
      selftests: tc-testing: add tunnel_key "nofrag" test case · b8617f8e
      Davide Caratti authored
      # ./tdc.py -e 6bda -l
       6bda: (actions, tunnel_key) Add tunnel_key action with nofrag option
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b8617f8e
    • Davide Caratti's avatar
      selftests: tc-testing: add "depends_on" property to skip tests · 7f3f8640
      Davide Caratti authored
      currently, users can skip individual test cases by means of writing
      
        "skip": "yes"
      
      in the scenario file. Extend this functionality, introducing 'dependsOn':
      it's optional property like "skip", but the value contains a command (for
      example, a probe on iproute2 to check if it supports a specific feature).
      If such property is present, tdc executes that command and skips the test
      when the return value is non-zero.
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f3f8640
    • Davide Caratti's avatar
      net/sched: act_tunnel_key: add support for "don't fragment" · 2384127e
      Davide Caratti authored
      extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT.
      Suggested-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2384127e
    • Petr Machata's avatar
      selftests: rtnetlink: Fix do_test_address_proto() · 46e9acb7
      Petr Machata authored
      This selftest was introduced recently in the commit cited below. It misses
      several check_err() invocations to actually verify that the previous
      command succeeded. When these are added, the first one fails, because
      besides the addresses added by hand, there can be a link-local address
      added by the kernel. Adjust the check to expect at least three addresses
      instead of exactly three, and add the missing check_err's.
      
      Furthermore, the explanatory comments assume that the address with no
      protocol is $addr2, when in fact it is $addr3. Update the comments.
      
      Fixes: 6a414fd7 ("selftests: rtnetlink: Add an address proto test")
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/53a579bc883e1bf2fe490d58427cf22c2d1aa21f.1680102695.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      46e9acb7
    • Nathan Chancellor's avatar
      net: ethernet: ti: Fix format specifier in netcp_create_interface() · 3292004c
      Nathan Chancellor authored
      After commit 3948b059 ("net: introduce a config option to tweak
      MAX_SKB_FRAGS"), clang warns:
      
        drivers/net/ethernet/ti/netcp_core.c:2085:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
                                MAX_SKB_FRAGS);
                                ^~~~~~~~~~~~~
        include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
                dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
                                                                       ~~~     ^~~~~~~~~~~
        include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
                        _p_func(dev, fmt, ##__VA_ARGS__);                       \
                                     ~~~    ^~~~~~~~~~~
        include/linux/skbuff.h:352:23: note: expanded from macro 'MAX_SKB_FRAGS'
        #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
                              ^~~~~~~~~~~~~~~~~~~~
        ./include/generated/autoconf.h:11789:30: note: expanded from macro 'CONFIG_MAX_SKB_FRAGS'
        #define CONFIG_MAX_SKB_FRAGS 17
                                     ^~
        1 warning generated.
      
      Follow the pattern of the rest of the tree by changing the specifier to
      '%u' and casting MAX_SKB_FRAGS explicitly to 'unsigned int', which
      eliminates the warning.
      
      Fixes: 3948b059 ("net: introduce a config option to tweak MAX_SKB_FRAGS")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Link: https://lore.kernel.org/r/20230329-net-ethernet-ti-wformat-v1-1-83d0f799b553@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3292004c
    • Vladimir Oltean's avatar
      net: dsa: fix db type confusion in host fdb/mdb add/del · eb1ab765
      Vladimir Oltean authored
      We have the following code paths:
      
      Host FDB (unicast RX filtering):
      
      dsa_port_standalone_host_fdb_add()   dsa_port_bridge_host_fdb_add()
                     |                                     |
                     +--------------+         +------------+
                                    |         |
                                    v         v
                               dsa_port_host_fdb_add()
      
      dsa_port_standalone_host_fdb_del()   dsa_port_bridge_host_fdb_del()
                     |                                     |
                     +--------------+         +------------+
                                    |         |
                                    v         v
                               dsa_port_host_fdb_del()
      
      Host MDB (multicast RX filtering):
      
      dsa_port_standalone_host_mdb_add()   dsa_port_bridge_host_mdb_add()
                     |                                     |
                     +--------------+         +------------+
                                    |         |
                                    v         v
                               dsa_port_host_mdb_add()
      
      dsa_port_standalone_host_mdb_del()   dsa_port_bridge_host_mdb_del()
                     |                                     |
                     +--------------+         +------------+
                                    |         |
                                    v         v
                               dsa_port_host_mdb_del()
      
      The logic added by commit 5e8a1e03 ("net: dsa: install secondary
      unicast and multicast addresses as host FDB/MDB") zeroes out
      db.bridge.num if the switch doesn't support ds->fdb_isolation
      (the majority doesn't). This is done for a reason explained in commit
      c2693363 ("net: dsa: request drivers to perform FDB isolation").
      
      Taking a single code path as example - dsa_port_host_fdb_add() - the
      others are similar - the problem is that this function handles:
      - DSA_DB_PORT databases, when called from
        dsa_port_standalone_host_fdb_add()
      - DSA_DB_BRIDGE databases, when called from
        dsa_port_bridge_host_fdb_add()
      
      So, if dsa_port_host_fdb_add() were to make any change on the
      "bridge.num" attribute of the database, this would only be correct for a
      DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge.
      
      However, this bug is without consequences, for 2 reasons:
      
      - dsa_port_standalone_host_fdb_add() is only called from code which is
        (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that
        function only returns true if ds->fdb_isolation is set. So, the code
        only executed for DSA_DB_BRIDGE databases.
      
      - Even if the code was not dead for DSA_DB_PORT, we have the following
        memory layout:
      
      struct dsa_bridge {
      	struct net_device *dev;
      	unsigned int num;
      	bool tx_fwd_offload;
      	refcount_t refcount;
      };
      
      struct dsa_db {
      	enum dsa_db_type type;
      
      	union {
      		const struct dsa_port *dp; // DSA_DB_PORT
      		struct dsa_lag lag;
      		struct dsa_bridge bridge; // DSA_DB_BRIDGE
      	};
      };
      
      So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of
      type DSA_DB_PORT would access memory which is unused, because we only
      use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address
      with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition.
      
      It is correct to fix up dsa_db :: bridge :: num only from code paths
      that come from the bridge / switchdev, so move these there.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eb1ab765
    • Tom Rix's avatar
      net: ksz884x: remove unused change variable · 9a865a98
      Tom Rix authored
      clang with W=1 reports
      drivers/net/ethernet/micrel/ksz884x.c:3216:6: error: variable
        'change' set but not used [-Werror,-Wunused-but-set-variable]
              int change = 0;
                  ^
      This variable is not used so remove it.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230329125929.1808420-1-trix@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9a865a98
  4. 30 Mar, 2023 16 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 79548b79
      Jakub Kicinski authored
      Conflicts:
      
      drivers/net/ethernet/mediatek/mtk_ppe.c
        3fbe4d8c ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
        92453132 ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79548b79
    • Linus Torvalds's avatar
      Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b2bc47e9
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from CAN and WPAN.
      
        Still quite a few bugs from this release. This pull is a bit smaller
        because major subtrees went into the previous one. Or maybe people
        took spring break off?
      
        Current release - regressions:
      
         - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
      
        Current release - new code bugs:
      
         - eth: wangxun: fix vector length of interrupt cause
      
         - vsock/loopback: consistently protect the packet queue with
           sk_buff_head.lock
      
         - virtio/vsock: fix header length on skb merging
      
         - wpan: ca8210: fix unsigned mac_len comparison with zero
      
        Previous releases - regressions:
      
         - eth: stmmac: don't reject VLANs when IFF_PROMISC is set
      
         - eth: smsc911x: avoid PHY being resumed when interface is not up
      
         - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
      
         - eth: bnx2x: use the right build_skb() helper after core rework
      
         - wwan: iosm: fix 7560 modem crash on use on unsupported channel
      
        Previous releases - always broken:
      
         - eth: sfc: don't overwrite offload features at NIC reset
      
         - eth: r8169: fix RTL8168H and RTL8107E rx crc error
      
         - can: j1939: prevent deadlock by moving j1939_sk_errqueue()
      
         - virt: vmxnet3: use GRO callback when UPT is enabled
      
         - virt: xen: don't do grant copy across page boundary
      
         - phy: dp83869: fix default value for tx-/rx-internal-delay
      
         - dsa: ksz8: fix multiple issues with ksz8_fdb_dump
      
         - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
      
         - eth: mtk_eth_soc: fix flow block refcounting logic
      
        Misc:
      
         - constify fwnode pointers in SFP handling"
      
      * tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
        net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
        net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
        net: ethernet: mtk_eth_soc: fix flow block refcounting logic
        net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
        net: dsa: sync unicast and multicast addresses for VLAN filters too
        net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
        xen/netback: use same error messages for same errors
        test/vsock: new skbuff appending test
        virtio/vsock: WARN_ONCE() for invalid state of socket
        virtio/vsock: fix header length on skb merging
        bnxt_en: Add missing 200G link speed reporting
        bnxt_en: Fix typo in PCI id to device description string mapping
        bnxt_en: Fix reporting of test result in ethtool selftest
        i40e: fix registers dump after run ethtool adapter self test
        bnx2x: use the right build_skb() helper
        net: ipa: compute DMA pool size properly
        net: wwan: iosm: fixes 7560 modem crash
        net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
        ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
        ice: add profile conflict check for AVF FDIR
        ...
      b2bc47e9
    • Linus Torvalds's avatar
      Merge tag 'for-6.3/dm-fixes-2' of... · b527ac44
      Linus Torvalds authored
      Merge tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix two DM core bugs in the code that handles splitting "abnormal" IO
         (discards, write same and secure erase) and issuing that IO to the
         correct underlying devices (and offsets within those devices).
      
      * tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: fix __send_duplicate_bios() to always allow for splitting IO
        dm: fix improper splitting for abnormal bios
      b527ac44
    • Johannes Berg's avatar
      wifi: clean up erroneously introduced file · aa2aa818
      Johannes Berg authored
      Evidently Gregory sent this file but I (apparently every else) missed
      it entirely, remove that.
      
      Fixes: cf85123a ("wifi: iwlwifi: mvm: support enabling and disabling HW timestamping")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      aa2aa818
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm · 0d3ff808
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
       "Two regression fixes in here, otherwise just the usual stuff:
      
         - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
           more
      
         - amdgpu: dp mst and hibernate regression fix
      
         - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix
      
         - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
           fixes"
      
      * tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
        Revert "drm/scheduler: track GPU active time per entity"
        Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
        drm/etnaviv: fix reference leak when mmaping imported buffer
        drm/amdgpu: allow more APUs to do mode2 reset when go to S4
        drm/amd/display: Take FEC Overhead into Timeslot Calculation
        drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
        drm: test: Fix 32-bit issue in drm_buddy_test
        drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
        drm/nouveau/kms: Fix backlight registration
        drm/i915/perf: Drop wakeref on GuC RC error
        drm/i915/dpt: Treat the DPT BO as a framebuffer
        drm/i915/gem: Flush lmem contents after construction
        drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
        drm/i915: Disable DC states for all commits
        drm/i915: Workaround ICL CSC_MODE sticky arming
        drm/i915: Add a .color_post_update() hook
        drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
        drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
        drm/i915/pmu: Use functions common with sysfs to read actual freq
        accel/ivpu: Fix IPC buffer header status field value
        ...
      0d3ff808
    • Paul Blakey's avatar
      netfilter: ctnetlink: Support offloaded conntrack entry deletion · 9b7c68b3
      Paul Blakey authored
      Currently, offloaded conntrack entries (flows) can only be deleted
      after they are removed from offload, which is either by timeout,
      tcp state change or tc ct rule deletion. This can cause issues for
      users wishing to manually delete or flush existing entries.
      
      Support deletion of offloaded conntrack entries.
      
      Example usage:
       # Delete all offloaded (and non offloaded) conntrack entries
       # whose source address is 1.2.3.4
       $ conntrack -D -s 1.2.3.4
       # Delete all entries
       $ conntrack -F
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      9b7c68b3
    • Matthieu De Beule's avatar
      netfilter: Correct documentation errors in nf_tables.h · a25b8b71
      Matthieu De Beule authored
      NFTA_RANGE_OP incorrectly says nft_cmp_ops instead of nft_range_ops.
      NFTA_LOG_GROUP and NFTA_LOG_QTHRESHOLD claim NLA_U32 instead of NLA_U16
      NFTA_EXTHDR_SREG isn't documented as a register
      Signed-off-by: default avatarMatthieu De Beule <matthieu.debeule@proton.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      a25b8b71
    • Eric Sage's avatar
      netfilter: nfnetlink_queue: enable classid socket info retrieval · 28c1b6df
      Eric Sage authored
      This enables associating a socket with a v1 net_cls cgroup. Useful for
      applying a per-cgroup policy when processing packets in userspace.
      Signed-off-by: default avatarEric Sage <eric_sage@apple.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      28c1b6df
    • Florian Westphal's avatar
      netfilter: nfnetlink_log: remove rcu_bh usage · 356e2adb
      Florian Westphal authored
      structure is free'd via call_rcu, so its safe to use rcu_read_lock only.
      
      While at it, skip rcu_read_lock for lookup from packet path, its always
      called with rcu held.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      356e2adb
    • Mike Snitzer's avatar
      dm: fix __send_duplicate_bios() to always allow for splitting IO · 666eed46
      Mike Snitzer authored
      Commit 7dd76d1f ("dm: improve bio splitting and associated IO
      accounting") only called setup_split_accounting() from
      __send_duplicate_bios() if a single bio were being issued. But the case
      where duplicate bios are issued must call it too.
      
      Otherwise the bio won't be split and resubmitted (via recursion through
      block core back to DM) to submit the later portions of a bio (which may
      map to an entirely different target).
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before (broken, discards the first striped target's devices twice):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
      
      After (works as expected):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
      
      Fixes: 7dd76d1f ("dm: improve bio splitting and associated IO accounting")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      666eed46
    • Mike Snitzer's avatar
      dm: fix improper splitting for abnormal bios · f7b58a69
      Mike Snitzer authored
      "Abnormal" bios include discards, write zeroes and secure erase. By no
      longer passing the calculated 'len' pointer, commit 7dd06a25 ("dm:
      allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
      senseless approach to disallowing dm_accept_partial_bio() from working
      for duplicate bios processed using __send_duplicate_bios().
      
      It inadvertently and incorrectly stopped the use of 'len' when
      initializing a target's io (in alloc_tio). As such the resulting tio
      could address more area of a device than it should.
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before this fix:
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
      After this fix;
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
      
      Fixes: 7dd06a25 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      f7b58a69
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow · 92453132
      Felix Fietkau authored
      The cache needs to be flushed to ensure that the hardware stops offloading
      the flow immediately.
      
      Fixes: 33fc42de ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92453132
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload · 5f36ca1b
      Felix Fietkau authored
      Check for skb metadata in order to detect the case where the DSA header
      is not present.
      
      Fixes: 2d7605a7 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5f36ca1b
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix flow block refcounting logic · 8c1cb87c
      Felix Fietkau authored
      Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
      call flow_block_cb_incref for a newly allocated cb.
      Also fix the accidentally inverted refcount check on unbind.
      
      Fixes: 502e84e2 ("net: ethernet: mtk_eth_soc: add flow offloading support")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c1cb87c
    • Russell King (Oracle)'s avatar
      net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit() · 2960a2d3
      Russell King (Oracle) authored
      Reported on the Turris forum, mvneta provokes kernel warnings in the
      architecture DMA mapping code when mvneta_setup_txqs() fails to
      allocate memory. This happens because when mvneta_cleanup_txqs() is
      called in the mvneta_stop() path, we leave pointers in the structure
      that have been freed.
      
      Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
      allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
      will walk all the queues freeing any non-NULL pointers - which includes
      pointers that were previously freed in mvneta_stop().
      
      Fix this by setting these pointers to NULL to prevent double-freeing
      of the same memory.
      
      Fixes: 2adb719d ("net: mvneta: Implement software TSO")
      Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2960a2d3
    • Vladimir Oltean's avatar
      net: dsa: sync unicast and multicast addresses for VLAN filters too · 64fdc5f3
      Vladimir Oltean authored
      If certain conditions are met, DSA can install all necessary MAC
      addresses on the CPU ports as FDB entries and disable flooding towards
      the CPU (we call this RX filtering).
      
      There is one corner case where this does not work.
      
      ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      ip link set swp0 master br0 && ip link set swp0 up
      ip link add link swp0 name swp0.100 type vlan id 100
      ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
      
      Traffic through swp0.100 is broken, because the bridge turns on VLAN
      filtering in the swp0 port (causing RX packets to be classified to the
      FDB database corresponding to the VID from their 802.1Q header), and
      although the 8021q module does call dev_uc_add() towards the real
      device, that API is VLAN-unaware, so it only contains the MAC address,
      not the VID; and DSA's current implementation of ndo_set_rx_mode() is
      only for VID 0 (corresponding to FDB entries which are installed in an
      FDB database which is only hit when the port is VLAN-unaware).
      
      It's interesting to understand why the bridge does not turn on
      IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
      that this is a regression caused by the logic in commit 2796d0c6
      ("bridge: Automatically manage port promiscuous mode."). After all,
      a bridge port needs to have IFF_PROMISC by its very nature - it needs to
      receive and forward frames with a MAC DA different from the bridge
      ports' MAC addresses.
      
      While that may be true, when the bridge is VLAN-aware *and* it has a
      single port, there is no real reason to enable promiscuity even if that
      is an automatic port, with flooding and learning (there is nowhere for
      packets to go except to the BR_FDB_LOCAL entries), and this is how the
      corner case appears. Adding a second automatic interface to the bridge
      would make swp0 promisc as well, and would mask the corner case.
      
      Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
      pass a VLAN ID), the only way to address that problem is to install host
      FDB entries for the cartesian product of RX filtering MAC addresses and
      VLAN RX filters.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64fdc5f3