Commits · b9703ed44ffbfba85c103b9de01886a225e14b38 · Kirill Smelkov / linux

21 Apr, 2023 40 commits

netfilter: nf_tables: support for adding new devices to an existing netdev chain · b9703ed4
Pablo Neira Ayuso authored Apr 21, 2023
```
This patch allows users to add devices to an existing netdev chain.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
b9703ed4

netfilter: nf_tables: rename function to destroy hook list · cdc32546

Pablo Neira Ayuso authored Apr 21, 2023

Rename nft_flowtable_hooks_destroy() by nft_hooks_destroy() to prepare
for netdev chain device updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cdc32546

netfilter: nf_tables: do not send complete notification of deletions · 28339b21

Pablo Neira Ayuso authored Apr 21, 2023

In most cases, table, name and handle is sufficient for userspace to
identify an object that has been deleted. Skipping unneeded fields in
the netlink attributes in the message saves bandwidth (ie. less chances
of hitting ENOBUFS).

Rules are an exception: the existing userspace monitor code relies on
the rule definition. This exception can be removed by implementing a
rule cache in userspace, this is already supported by the tracing
infrastructure.

Regarding flowtables, incremental deletion of devices is possible.
Skipping a full notification allows userspace to differentiate between
flowtable removal and incremental removal of devices.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

28339b21

netfilter: nf_tables: extended netlink error reporting for netdevice · c3c060ad

Pablo Neira Ayuso authored Apr 21, 2023

Flowtable and netdev chains are bound to one or several netdevice,
extend netlink error reporting to specify the the netdevice that
triggers the error.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c3c060ad

ipvs: Correct spelling in comments · c7d15aaa

Simon Horman authored Apr 17, 2023

Correct some spelling errors flagged by codespell and found by inspection.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c7d15aaa

ipvs: Remove {Enter,Leave}Function · 210ffe4a

Simon Horman authored Apr 17, 2023

Remove EnterFunction and LeaveFunction.

These debugging macros seem well past their use-by date.  And seem to
have little value these days. Removing them allows some trivial cleanup
of some exit paths for some functions. These are also included in this
patch. There is likely scope for further cleanup of both debugging and
unwind paths. But let's leave that for another day.

Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG
is enabled. Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

210ffe4a

ipvs: Consistently use array_size() in ip_vs_conn_init() · 28065493

Simon Horman authored Apr 17, 2023

Consistently use array_size() to calculate the size of ip_vs_conn_tab
in bytes.

Flagged by Coccinelle:
 WARNING: array_size is already used (line 1498) to compute the same size

No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

28065493

ipvs: Update width of source for ip_vs_sync_conn_options · e3478c68

Simon Horman authored Apr 17, 2023

In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options.
That structure looks like this:

struct ip_vs_sync_conn_options {
        struct ip_vs_seq        in_seq;
        struct ip_vs_seq        out_seq;
};

The source of the copy is the in_seq field of struct ip_vs_conn.  Whose
type is struct ip_vs_seq. Thus we can see that the source - is not as
wide as the amount of data copied, which is the width of struct
ip_vs_sync_conn_option.

The copy is safe because the next field in is another struct ip_vs_seq.
Make use of struct_group() to annotate this.

Flagged by gcc-13 as:

 In file included from ./include/linux/string.h:254,
                  from ./include/linux/bitmap.h:11,
                  from ./include/linux/cpumask.h:12,
                  from ./arch/x86/include/asm/paravirt.h:17,
                  from ./arch/x86/include/asm/cpuid.h:62,
                  from ./arch/x86/include/asm/processor.h:19,
                  from ./arch/x86/include/asm/timex.h:5,
                  from ./include/linux/timex.h:67,
                  from ./include/linux/time32.h:13,
                  from ./include/linux/time.h:60,
                  from ./include/linux/stat.h:19,
                  from ./include/linux/module.h:13,
                  from net/netfilter/ipvs/ip_vs_sync.c:38:
 In function 'fortify_memcpy_chk',
     inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3:
 ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
   529 |                         __read_overflow2_field(q_size_field, size);
       |

Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e3478c68

netfilter: nf_tables: do not store rule in traceinfo structure · 46df4175

Florian Westphal authored Apr 14, 2023

pass it as argument instead.  This reduces size of traceinfo to
16 bytes.  Total stack usage:

 nf_tables_core.c:252 nft_do_chain    304     static

While its possible to also pass basechain as argument, doing so
increases nft_do_chaininfo function size.

Unlike pktinfo/verdict/rule the basechain info isn't used in
the expression evaluation path. gcc places it on the stack, which
results in extra push/pop when it gets passed to the trace helpers
as argument rather than as part of the traceinfo structure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

46df4175

netfilter: nf_tables: do not store verdict in traceinfo structure · 0a202145

Florian Westphal authored Apr 14, 2023

Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes:

nf_tables_core.c:256 nft_do_chain    312     static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0a202145

netfilter: nf_tables: do not store pktinfo in traceinfo structure · 698bb828

Florian Westphal authored Apr 14, 2023

pass it as argument.  No change in object size.

stack usage decreases by 8 byte:
 nf_tables_core.c:254  nft_do_chain       320     static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

698bb828

netfilter: nf_tables: remove unneeded conditional · 2a1d6abd

Florian Westphal authored Apr 14, 2023

This helper is inlined, so keep it as small as possible.

If the static key is true, there is only a very small chance
that info->trace is false:

1. tracing was enabled at this very moment, the static key was
   updated to active right after nft_do_table was called.

2. tracing was disabled at this very moment.
   trace->info is already false, the static key is about to
   be patched to false soon.

In both cases, no event will be sent because info->trace
is false (checked in noinline slowpath). info->nf_trace is irrelevant.

The nf_trace update is redunant in this case, but this will only
happen for short duration, when static key flips.

       text  data   bss   dec   hex filename
old:   2980   192    32  3204   c84 nf_tables_core.o
new:   2964   192    32  3188   c74i nf_tables_core.o
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2a1d6abd

netfilter: nf_tables: make validation state per table · 00c320f9

Florian Westphal authored Apr 13, 2023

We only need to validate tables that saw changes in the current
transaction.

The existing code revalidates all tables, but this isn't needed as
cross-table jumps are not allowed (chains have table scope).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

00c320f9

netfilter: nf_tables: don't write table validation state without mutex · 9a32e985

Florian Westphal authored Apr 13, 2023

The ->cleanup callback needs to be removed, this doesn't work anymore as
the transaction mutex is already released in the ->abort function.

Just do it after a successful validation pass, this either happens
from commit or abort phases where transaction mutex is held.

Fixes: f102d66b ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9a32e985

netfilter: nf_tables: don't store chain address on jump · 63e9bbbc

Florian Westphal authored Apr 11, 2023

Now that the rule trailer/end marker and the rcu head reside in the
same structure, we no longer need to save/restore the chain pointer
when performing/returning from a jump.

We can simply let the trace infra walk the evaluated rule until it
hits the end marker and then fetch the chain pointer from there.

When the rule is NULL (policy tracing), then chain and basechain
pointers were already identical, so just use the basechain.

This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit,
scripts/stackusage says:

nf_tables_core.c:251 nft_do_chain    328     static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

63e9bbbc

netfilter: nf_tables: don't store address of last rule on jump · d4d89e65

Florian Westphal authored Apr 11, 2023

Walk the rule headers until the trailer one (last_bit flag set) instead
of stopping at last_rule address.

This avoids the need to store the address when jumping to another chain.

This cuts size of jumpstack array by one third, on 64bit from
384 to 256 bytes.  Still, stack usage is still quite large:

scripts/stackusage:
nf_tables_core.c:258 nft_do_chain    496     static

Next patch will also remove chain pointer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d4d89e65

netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker · e38fbfa9

Florian Westphal authored Apr 11, 2023

In order to free the rules in a chain via call_rcu, the rule array used
to stash a rcu_head and space for a pointer at the end of the rule array.

When the current nft_rule_dp blob format got added in
2c865a8a ("netfilter: nf_tables: add rule blob layout"), this results
in a double-trailer:

  size (unsigned long)
  struct nft_rule_dp
    struct nft_expr
         ...
    struct nft_rule_dp
     struct nft_expr
         ...
    struct nft_rule_dp (is_last=1) // Trailer

The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size,
so it can be located via start_addr + size.

Because the rcu_head is stored after 'start+size' as well this means the
is_last trailer is *aliased* to the rcu_head (struct nft_rules_old).

This is harmless, because at this time the nft_do_chain function never
evaluates/accesses the trailer, it only checks the address boundary:

        for (; rule < last_rule; rule = nft_rule_next(rule)) {
...

But this way the last_rule address has to be stashed in the jump
structure to restore it after returning from a chain.

nft_do_chain stack usage has become way too big, so put it on a diet.

Without this patch is impossible to use
        for (; !rule->is_last; rule = nft_rule_next(rule)) {

... because on free, the needed update of the rcu_head will clobber the
nft_rule_dp is_last bit.

Furthermore, also stash the chain pointer in the trailer, this allows
to recover the original chain structure from nf_tables_trace infra
without a need to place them in the jump struct.

After this patch it is trivial to diet the jump stack structure,
done in the next two patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e38fbfa9

Merge tag 'wireless-next-2023-04-21' of... · ca288965

Jakub Kicinski authored Apr 21, 2023

Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.4

Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.

Major changes:

cfg80211/mac80211
 - fix some Fine Time Measurement (FTM) frames not being bufferable
 - flush frames before key removal to avoid potential unencrypted
   transmission depending on the hardware design

iwlwifi
 - preparation for Wi-Fi 7 EHT and multi-link support

rtw88
 - SDIO bus support
 - RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support

rtw89
 - framework firmware backwards compatibility

brcmfmac
 - Cypress 43439 SDIO support

mt76
 - mt7921 P2P support
 - mt7996 mesh A-MSDU support
 - mt7996 EHT support
 - mt7996 coredump support

wcn36xx
 - support for pronto v3 hardware

ath11k
 - PCIe DeviceTree bindings
 - WCN6750: enable SAR support

ath10k
 - convert DeviceTree bindings to YAML

* tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits)
  wifi: rtw88: Update spelling in main.h
  wifi: airo: remove ISA_DMA_API dependency
  wifi: rtl8xxxu: Simplify setting the initial gain
  wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear}
  wifi: rtl8xxxu: Don't print the vendor/product/serial
  wifi: rtw88: Fix memory leak in rtw88_usb
  wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
  wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
  wifi: rtw88: rtw8821c: Fix rfe_option field width
  wifi: rtw88: usb: fix priority queue to endpoint mapping
  wifi: rtw88: 8822c: add iface combination
  wifi: rtw88: handle station mode concurrent scan with AP mode
  wifi: rtw88: prevent scan abort with other VIFs
  wifi: rtw88: refine reserved page flow for AP mode
  wifi: rtw88: disallow PS during AP mode
  wifi: rtw88: 8822c: extend reserved page number
  wifi: rtw88: add port switch for AP mode
  wifi: rtw88: add bitmap for dynamic port settings
  wifi: rtw89: mac: use regular int as return type of DLE buffer request
  wifi: mac80211: remove return value check of debugfs_create_dir()
  ...
====================

Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ca288965

net/packet: support mergeable feature of virtio · dfc39d40

Jianfeng Tan authored Apr 19, 2023

Packet sockets, like tap, can be used as the backend for kernel vhost.
In packet sockets, virtio net header size is currently hardcoded to be
the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
always the case: some virtio features, such as mrg_rxbuf, need virtio
net header to be 12-byte long.

Mergeable buffers, as a virtio feature, is worthy of supporting: packets
that are larger than one-mbuf size will be dropped in vhost worker's
handle_rx if mrg_rxbuf feature is not used, but large packets
cannot be avoided and increasing mbuf's size is not economical.

With this virtio feature enabled by virtio-user, packet sockets with
hardcoded 10-byte virtio net header will parse mac head incorrectly in
packet_snd by taking the last two bytes of virtio net header as part of
mac header.
This incorrect mac header parsing will cause packet to be dropped due to
invalid ether head checking in later under-layer device packet receiving.

By adding extra field vnet_hdr_sz with utilizing holes in struct
packet_sock to record currently used virtio net header size and supporting
extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
sockets can know the exact length of virtio net header that virtio user
gives.
In packet_snd, tpacket_snd and packet_recvmsg, instead of using
hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
corresponding packet_sock, and parse mac header correctly based on this
information to avoid the packets being mistakenly dropped.
Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com>
Co-developed-by: Anqi Shen <amy.saq@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfc39d40

Merge branch 'mlx5-ipsec-fixes' · 156c9398

David S. Miller authored Apr 21, 2023

Leon Romanovsky says:

====================
Fixes to mlx5 IPsec implementation

This small patchset includes various fixes and one refactoring patch
which I collected for the features sent in this cycle, with one exception -
first patch.

First patch fixes code which was introduced in previous cycle, however I
was able to trigger FW error only in custom debug code, so don't see a
need to send it to net-rc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

156c9398

net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs · 45fd01f2

Leon Romanovsky authored Apr 20, 2023

ARP discovery code has same logic for RX and TX flows, but with
different source and destination fields. Instead of duplicating
same code in mlx5e_ipsec_init_macs, let's refactor.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45fd01f2

net/mlx5e: Properly release work data structure · 94edec44

Leon Romanovsky authored Apr 20, 2023

There are some flows in which work structure is not allocated at all
and it is needed to be checked prior release of data structure.

 general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
 CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: events xfrm_state_gc_task
 RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core]
 Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89
 RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206
 RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000
 RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050
 RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000
 R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000
 R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118
 FS:  0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:

  ___xfrm_state_destroy+0x3c8/0x5e0
  xfrm_state_gc_task+0xf6/0x140
  ? ___xfrm_state_destroy+0x5e0/0x5e0
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? spin_bug+0x1d0/0x1d0
  worker_thread+0x59d/0xec0
  ? __kthread_parkme+0xd9/0x1d0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30

 Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core]
 ---[ end trace 0000000000000000 ]---

Fixes: 4562116f ("net/mlx5e: Generalize IPsec work structs")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94edec44

net/mlx5e: Compare all fields in IPv6 address · 3198ae7d

Leon Romanovsky authored Apr 20, 2023

Fix size argument in memcmp to compare whole IPv6 address.

Fixes: b3beba1f ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3198ae7d

net/mlx5e: Don't overwrite extack message returned from IPsec SA validator · 697b3518

Leon Romanovsky authored Apr 20, 2023

Addition of new err_xfrm label caused to error messages be overwritten.
Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
in a default message.

Fixes: aa8bd0c9 ("net/mlx5e: Support IPsec acquire default SA")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

697b3518

net/mlx5e: Fix FW error while setting IPsec policy block action · e239e31a

Leon Romanovsky authored Apr 20, 2023

When trying to set IPsec policy block action the following error is
generated:

 mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
	status bad parameter(0x3), syndrome (0x8708c3), err(-22)

This error means that drop action is not allowed when modify action is
set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.

Fixes: 67212396 ("net/mlx5e: Skip IPsec encryption for TX path without matching policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e239e31a

net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports · 35226750

Yan Wang authored Apr 19, 2023

The system hang because of dsa_tag_8021q_port_setup()->
				stmmac_vlan_rx_add_vid().

I found in stmmac_drv_probe() that cailing pm_runtime_put()
disabled the clock.

First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
resume/suspend is active.

Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
registers after stmmac's clock is closed.

I would suggest adding the pm_runtime_resume_and_get() to the
stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
while in use.

Fixes: b3dcb312 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yan Wang <rk.code@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35226750

Merge branch 'pds_core' · d8bb3824

David S. Miller authored Apr 21, 2023

Shannon Nelson says:

====================
pds_core driver

Summary:
--------
This patchset implements a new driver for use with the AMD/Pensando
Distributed Services Card (DSC), intended to provide core configuration
services through the auxiliary_bus and through a couple of EXPORTed
functions for use initially in VFio and vDPA feature specific drivers.

To keep this patchset to a manageable size, the pds_vdpa and pds_vfio
drivers have been split out into their own patchsets to be reviewed
separately.

Detail:
-------
AMD/Pensando is making available a new set of devices for supporting vDPA,
VFio, and potentially other features in the Distributed Services Card
(DSC).  These features are implemented through a PF that serves as a Core
device for controlling and configuring its VF devices.  These VF devices
have separate drivers that use the auxiliary_bus to work through the Core
device as the control path.

Currently, the DSC supports standard ethernet operations using the
ionic driver.  This is not replaced by the Core-based devices - these
new devices are in addition to the existing Ethernet device.  Typical DSC
configurations will include both PDS devices and Ionic Eth devices.
However, there is a potential future path for ethernet services to come
through this device as well.

The Core device is a new PCI PF/VF device managed by a new driver
'pds_core'.  The PF device has access to an admin queue for configuring
the services used by the VFs, and sets up auxiliary_bus devices for each
vDPA VF for communicating with the drivers for the vDPA devices.  The VFs
may be for VFio or vDPA, and other services in the future; these VF types
are selected as part of the DSC internal FW configurations, which is out
of the scope of this patchset.

When the vDPA support set is enabled in the core PF through its devlink
param, auxiliary_bus devices are created for each VF that supports the
feature.  The vDPA driver then connects to and uses this auxiliary_device
to do control path configuration through the PF device.  This can then be
used with the vdpa kernel module to provide devices for virtio_vdpa kernel
module for host interfaces, or vhost_vdpa kernel module for interfaces
exported into your favorite VM.

A cheap ASCII diagram of a vDPA instance looks something like this:

                                ,----------.
                                |   vdpa   |
                                '----------'
                                  |     ||
                                 ctl   data
                                  |     ||
                          .----------.  ||
                          | pds_vdpa |  ||
                          '----------'  ||
                               |        ||
                       pds_core.vDPA.1  ||
                               |        ||
                    .---------------.   ||
                    |   pds_core    |   ||
                    '---------------'   ||
                        ||         ||   ||
                      09:00.0      09:00.1
        == PCI ============================================
                        ||            ||
                   .----------.   .----------.
            ,------|    PF    |---|    VF    |-------,
            |      '----------'   '----------'       |
            |                  DSC                   |
            |                                        |
            ------------------------------------------

Changes:
  v11:
 - change strncpy to strscpy
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202304181137.WaZTYyAa-lkp@intel.com/

  v10:
Link: https://lore.kernel.org/netdev/20230418003228.28234-1-shannon.nelson@amd.com/
 - remove CONFIG_DEBUG_FS guard static inline stuff
 - remove unnecessary 0 and null initializations
 - verify in driver load that PDS_CORE_DRV_NAME matches KBUILD_MODNAME
 - remove debugfs irqs_show(), redundant with /proc
 - return -ENOMEM if intr_info = kcalloc() fails
 - move the status code enum into pds_core_if.h as part of API definition
 - fix up one place in pdsc_devcmd_wait() we're using the status codes where we could use the errno
 - remove redundant calls to flush_workqueue()
 - grab config_lock before testing state bits in pdsc_fw_reporter_diagnose()
 - change pdsc_color_match() to return bool
 - remove useless VIF setup loop and just setup vDPA services for now
 - remove pf pointer from struct padev and have clients use pci_physfn()
 - drop use of "vf" in auxdev.c function names, make more generic
 - remove last of client ops struct and simply export the functions
 - drop drivers@pensando.io from MAINTAINERS and add new include dir
 - include dynamic_debug.h in adminq.c to protect dynamic_hex_dump()
 - fixed fw_slot type from u8 to int for handling error returns
 - fixed comment spelling
 - changed void arg in pdsc_adminq_post() to struct pdsc *

  v9:
Link: https://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/
 - change pdsc field name id to uid to clarify the unique id used for aux device
 - remove unnecessary pf->state and other checks in aux device creation
 - hardcode fw slotnames for devlink info, don't use strings from FW
 - handle errors from PDS_CORE_CMD_INIT devcmd call
 - tighten up health thread use of config_lock
 - remove pdsc_queue_health_check() layer over queuing health check
 - start pds_core.rst file in first patch, add to it incrementally
 - give more user interaction info in commit messages
 - removed a few more extraneous includes

  v8:
Link: https://lore.kernel.org/netdev/20230330234628.14627-1-shannon.nelson@amd.com/
 - fixed deadlock problem, use devl_health_reporter_destroy() when devlink is locked
 - don't clear client_id until after auxiliary_device_uninit()

  v7:
Link: https://lore.kernel.org/netdev/20230330192313.62018-1-shannon.nelson@amd.com/
 - use explicit devlink locking and devl_* APIs
 - move some of devlink setup logic into probe and remove
 - use debugfs_create_u{type}() for state and queue head and tail
 - add include for linux/vmalloc.h
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303260420.Tgq0qobF-lkp@intel.com/

  v6:
Link: https://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/
 - removed version.h include noticed by kernel test robot's version check
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303230742.pX3ply0t-lkp@intel.com/
 - fixed up the more egregious checkpatch line length complaints
 - make sure pdsc_auxbus_dev_register() checks padev pointer errcode

  v5:
Link: https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nelson@amd.com/
 - added devlink health reporter for FW issues
 - removed asic_type, asic_rev, serial_num, fw_version from debugfs as
   they are available through other means
 - trimed OS info in pdsc_identify(), we don't need to send that much info to the FW
 - removed reg/unreg from auxbus client API, they are now in the core when VF
   is started
 - removed need for pdsc definition in client by simplifying the padev to only carry
   struct pci_dev pointers rather than full struct pdsc to the pf and vf
 - removed the unused pdsc argument in pdsc_notify()
 - moved include/linux/pds/pds_core.h to driver/../pds_core/core.h
 - restored a few pds_core_if.h interface values and structs that are shared
   with FW source
 - moved final config_lock unlock to before tear down of timer and workqueue
   to be sure there are no deadlocks while waiting for any stragglers
 - changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
   without regard to kernel PAGE_SIZE configuration
 - removed the redundant *adminqcq argument from pdsc_adminq_post()

  v4:
Link: https://lore.kernel.org/netdev/20230308051310.12544-1-shannon.nelson@amd.com/
 - reworked to attach to both Core PF and vDPA VF PCI devices
 - now creates auxiliary_device as part of each VF PCI probe, removes them on PCI remove
 - auxiliary devices now use simple unique id rather than PCI address for identifier
 - replaced home-grown event publishing with kernel-based notifier service
 - dropped live_migration parameter, not needed when not creating aux device for it
 - replaced devm_* functions with traditional interfaces
 - added MAINTAINERS entry
 - removed lingering traces of set/get_vf attribute adminq commands
 - trimmed some include lists
 - cleaned a kernel test robot complaint about a stray unused variable
        Link: https://lore.kernel.org/oe-kbuild-all/202302181049.yeUQMeWY-lkp@intel.com/

  v3:
Link: https://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/
 - changed names from "pensando" to "amd" and updated copyright strings
 - dropped the DEVLINK_PARAM_GENERIC_ID_FW_BANK for future development
 - changed the auxiliary device creation to be triggered by the
   PCI bus event BOUND_DRIVER, and torn down at UNBIND_DRIVER in order
   to properly handle users using the sysfs bind/unbind functions
 - dropped some noisy log messages
 - rebased to current net-next

  RFC to v2:
Link: https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
 - added separate devlink param patches for DEVLINK_PARAM_GENERIC_ID_ENABLE_MIGRATION
   and DEVLINK_PARAM_GENERIC_ID_FW_BANK, and dropped the driver specific implementations
 - updated descriptions for the new devlink parameters
 - dropped netdev support
 - dropped vDPA patches, will followup later
 - separated fw update and fw bank select into their own patches

  RFC:
Link: https://lore.kernel.org/netdev/20221118225656.48309-1-snelson@pensando.io/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d8bb3824

pds_core: Kconfig and pds_core.rst · ddbcb220

Shannon Nelson authored Apr 19, 2023

Remaining documentation and Kconfig hook for building the driver.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddbcb220

pds_core: publish events to the clients · d24c2827

Shannon Nelson authored Apr 19, 2023

When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler.  Add the code to
pass along the events to the clients.

The entry points pdsc_register_notify() and pdsc_unregister_notify()
are EXPORTed for other drivers that want to listen for these events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d24c2827

pds_core: add the aux client API · 10659034

Shannon Nelson authored Apr 19, 2023

Add the client API operations for running adminq commands.
The core registers the client with the FW, then the client
has a context for requesting adminq services.  We expect
to add additional operations for other clients, including
requesting additional private adminqs and IRQs, but don't have
the need yet.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

10659034

pds_core: devlink params for enabling VIF support · 40ced894

Shannon Nelson authored Apr 19, 2023

Add the devlink parameter switches so the user can enable
the features supported by the VFs.  The only feature supported
at the moment is vDPA.

Example:
    devlink dev param set pci/0000:2b:00.0 \
	    name enable_vnet cmode runtime value true
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

40ced894

pds_core: add auxiliary_bus devices · 4569cce4

Shannon Nelson authored Apr 19, 2023

An auxiliary_bus device is created for each vDPA type VF at VF
probe and destroyed at VF remove.  The aux device name comes
from the driver name + VIF type + the unique id assigned at PCI
probe.  The VFs are always removed on PF remove, so there should
be no issues with VFs trying to access missing PF structures.

The auxiliary_device names will look like "pds_core.vDPA.nn"
where 'nn' is the VF's uid.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4569cce4

pds_core: add initial VF device handling · f53d9311

Shannon Nelson authored Apr 19, 2023

This is the initial VF PCI driver framework for the new
pds_vdpa VF device, which will work in conjunction with an
auxiliary_bus client of the pds_core driver.  This does the
very basics of registering for the new VF device, setting
up debugfs entries, and registering with devlink.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f53d9311

pds_core: set up the VIF definitions and defaults · 65e0185a

Shannon Nelson authored Apr 19, 2023

The Virtual Interfaces (VIFs) supported by the DSC's
configuration (vDPA, Eth, RDMA, etc) are reported in the
dev_ident struct and made visible in debugfs.  At this point
only vDPA is supported in this driver so we only setup
devices for that feature.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e0185a

pds_core: add FW update feature to devlink · 49ce92fb

Shannon Nelson authored Apr 19, 2023

Add in the support for doing firmware updates.  Of the two
main banks available, a and b, this updates the one not in
use and then selects it for the next boot.

Example:
    devlink dev flash pci/0000:b2:00.0 \
	    file pensando/dsc_fw_1.63.0-22.tar
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

49ce92fb

pds_core: Add adminq processing and commands · 01ba61b5

Shannon Nelson authored Apr 19, 2023

Add the service routines for submitting and processing
the adminq messages and for handling notifyq events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ba61b5

pds_core: set up device and adminq · 45d76f49

Shannon Nelson authored Apr 19, 2023

Set up the basic adminq and notifyq queue structures.  These are
used mostly by the client drivers for feature configuration.
These are essentially the same adminq and notifyq as in the
ionic driver.

Part of this includes querying for device identity and FW
information, so we can make that available to devlink dev info.

  $ devlink dev info pci/0000:b5:00.0
  pci/0000:b5:00.0:
    driver pds_core
    serial_number FLM18420073
    versions:
        fixed:
          asic.id 0x0
          asic.rev 0x0
        running:
          fw 1.51.0-73
        stored:
          fw.goldfw 1.15.9-C-22
          fw.mainfwa 1.60.0-73
          fw.mainfwb 1.60.0-57
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

45d76f49

pds_core: add devlink health facilities · 25b450c0

Shannon Nelson authored Apr 19, 2023

Add devlink health reporting on top of our fw watchdog.

Example:
  # devlink health show pci/0000:2b:00.0 reporter fw
  pci/0000:2b:00.0:
    reporter fw
      state healthy error 0 recover 0
  # devlink health diagnose pci/0000:2b:00.0 reporter fw
   Status: healthy State: 1 Generation: 0 Recoveries: 0
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

25b450c0

pds_core: health timer and workqueue · c2dbb090

Shannon Nelson authored Apr 19, 2023

Add in the periodic health check and the related workqueue,
as well as the handlers for when a FW reset is seen.

The firmware is polled every 5 seconds to be sure that it is
still alive and that the FW generation didn't change.

The alive check looks to see that the PCI bus is still readable
and the fw_status still has the RUNNING bit on.  If not alive,
the driver stops activity and tears things down.  When the FW
recovers and the alive check again succeeds, the driver sets
back up for activity.

The generation check looks at the fw_generation to see if it
has changed, which can happen if the FW crashed and recovered
or was updated in between health checks.  If changed, the
driver counts that as though the alive test failed and forces
the fw_down/fw_up cycle.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2dbb090

pds_core: add devcmd device interfaces · 523847df

Shannon Nelson authored Apr 19, 2023

The devcmd interface is the basic connection to the device through the
PCI BAR for low level identification and command services.  This does
the early device initialization and finds the identity data, and adds
devcmd routines to be used by later driver bits.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

523847df