Commits · 6baeb3951c271cff30828c4763fa1362da56454a · Kirill Smelkov / linux

31 Aug, 2021 4 commits

net: bridge: use mld2r_ngrec instead of icmpv6_dataun · 6baeb395

MichelleJin authored Aug 29, 2021

br_ip6_multicast_mld2_report function uses icmp6h
to parse mld2_report packet.

mld2r_ngrec defines mld2r_hdr.icmp6_dataun.un_data16[1]
in include/net/mld.h.

So, it is more compact to use mld2r rather than icmp6h.

By doing printk test, it is confirmed that
icmp6h->icmp6_dataun.un_data16[1] and mld2r->mld2r_ngrec are
indeed equivalent.

Also, sizeof(*mld2r) and sizeof(*icmp6h) are equivalent, too.
Signed-off-by: MichelleJin <shjy180909@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6baeb395

net: qualcomm: fix QCA7000 checksum handling · 429205da

Stefan Wahren authored Aug 28, 2021

Based on tests the QCA7000 doesn't support checksum offloading. So assume
ip_summed is CHECKSUM_NONE and let the kernel take care of the checksum
handling. This fixes data transfer issues in noisy environments.
Reported-by: Michael Heimpold <michael.heimpold@in-tech.com>
Fixes: 291ab06e ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

429205da

net: pasemi: Remove usage of the deprecated "pci-dma-compat.h" API · a16ef91a

Christophe JAILLET authored Aug 28, 2021

In [1], Christoph Hellwig has proposed to remove the wrappers in
include/linux/pci-dma-compat.h.

Some reasons why this API should be removed have been given by Julia
Lawall in [2].

A coccinelle script has been used to perform the needed transformation
Only relevant parts are given below.

An 'unlikely()' has been removed when calling 'dma_mapping_error()' because
this function, which is inlined, already has such an annotation.

@@ @@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@ @@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

[1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
[2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/bc6cd281eae024b26fd9c7ef6678d2d1dc9d74fd.1630150008.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a16ef91a

net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed · c6607012

Xiyu Yang authored Aug 29, 2021

The reference counting issue happens in one exception handling path of
cbq_change_class(). When failing to get tcf_block, the function forgets
to decrease the refcount of "rtab" increased by qdisc_put_rtab(),
causing a refcount leak.

Fix this issue by jumping to "failure" label when get tcf_block failed.

Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c6607012

30 Aug, 2021 36 commits

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 19a31d79

Jakub Kicinski authored Aug 30, 2021

Daniel Borkmann says:

====================
bpf-next 2021-08-31

We've added 116 non-merge commits during the last 17 day(s) which contain
a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).

The main changes are:

1) Add opaque bpf_cookie to perf link which the program can read out again,
   to be used in libbpf-based USDT library, from Andrii Nakryiko.

2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.

3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.

4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
   to another congestion control algorithm during init, from Martin KaFai Lau.

5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.

6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.

7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
   progs, from Xu Liu and Stanislav Fomichev.

8) Support for __weak typed ksyms in libbpf, from Hao Luo.

9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.

10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.

11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.

12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
    Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.

13) Another big batch to revamp XDP samples in order to give them consistent look
    and feel, from Kumar Kartikeya Dwivedi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
  MAINTAINERS: Remove self from powerpc BPF JIT
  selftests/bpf: Fix potential unreleased lock
  samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
  selftests/bpf: Reduce more flakyness in sockmap_listen
  bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
  bpf: selftests: Add dctcp fallback test
  bpf: selftests: Add connect_to_fd_opts to network_helpers
  bpf: selftests: Add sk_state to bpf_tcp_helpers.h
  bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
  selftests: xsk: Preface options with opt
  selftests: xsk: Make enums lower case
  selftests: xsk: Generate packets from specification
  selftests: xsk: Generate packet directly in umem
  selftests: xsk: Simplify cleanup of ifobjects
  selftests: xsk: Decrease sending speed
  selftests: xsk: Validate tx stats on tx thread
  selftests: xsk: Simplify packet validation in xsk tests
  selftests: xsk: Rename worker_* functions that are not thread entry points
  selftests: xsk: Disassociate umem size with packets sent
  selftests: xsk: Remove end-of-test packet
  ...
====================

Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

19a31d79

sch_htb: Fix inconsistency when leaf qdisc creation fails · ca49bfd9

Maxim Mikityanskiy authored Aug 26, 2021

In HTB offload mode, qdiscs of leaf classes are grafted to netdev
queues. sch_htb expects the dev_queue field of these qdiscs to point to
the corresponding queues. However, qdisc creation may fail, and in that
case noop_qdisc is used instead. Its dev_queue doesn't point to the
right queue, so sch_htb can lose track of used netdev queues, which will
cause internal inconsistencies.

This commit fixes this bug by keeping track of the netdev queue inside
struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
new field, the two values are synced on writes, and WARNs are added to
assert equality of the two values.

The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
the driver used to pass the old and new queue IDs to sch_htb. Now that
there is a new field (offload_queue) in struct htb_class that needs to
be updated on this operation, the driver will pass the old class ID to
sch_htb instead (it already knows the new class ID).

Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ca49bfd9

MAINTAINERS: Remove self from powerpc BPF JIT · fca35b11

Sandipan Das authored Aug 27, 2021

Stepping down as I haven't had a chance to look into the powerpc
BPF JIT compilers for a while.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210827111905.396145-1-sandipan@linux.ibm.com

fca35b11

net: ipv4: Fix the warning for dereference · 1b9fbe81

Yajun Deng authored Aug 30, 2021

Add a if statements to avoid the warning.

Dan Carpenter report:
The patch faf482ca: "net: ipv4: Move ip_options_fragment() out of
loop" from Aug 23, 2021, leads to the following Smatch complaint:

    net/ipv4/ip_output.c:833 ip_do_fragment()
    warn: variable dereferenced before check 'iter.frag' (see line 828)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: faf482ca ("net: ipv4: Move ip_options_fragment() out of loop")
Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#tSigned-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b9fbe81

net: qrtr: make checks in qrtr_endpoint_post() stricter · aaa8e492

Dan Carpenter authored Aug 30, 2021

These checks are still not strict enough. The main problem is that if
"cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
guaranteed to be 4 but we need to be at least 16 bytes. In fact, we
can reject everything smaller than sizeof(*pkt) which is 20 bytes.

Also I don't like the ALIGN(size, 4). It's better to just insist that
data is needs to be aligned at the start.

Fixes: 0baa99ee ("net: qrtr: Allow non-immediate node routing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaa8e492

fix array-index-out-of-bounds in taprio_change · efe487fc

Haimin Zhang authored Aug 30, 2021

syzbot report an array-index-out-of-bounds in taprio_change
index 16 is out of range for type '__u16 [16]'
that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check
the return value of netdev_set_num_tc.

Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efe487fc

net: fix NULL pointer reference in cipso_v4_doi_free · e842cb60

王贇 authored Aug 30, 2021

In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:

  BUG: kernel NULL pointer dereference, address:
  ...
  RIP: 0010:cipso_v4_doi_free+0x3a/0x80
  ...
  Call Trace:
   netlbl_cipsov4_add_std+0xf4/0x8c0
   netlbl_cipsov4_add+0x13f/0x1b0
   genl_family_rcv_msg_doit.isra.15+0x132/0x170
   genl_rcv_msg+0x125/0x240

This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when doi_def->type got value 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.

This patch just add the check to prevent panic happen in similar
cases.
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e842cb60

Merge branch 'inet-exceptions-less-predictable' · 63cad4c7

David S. Miller authored Aug 30, 2021

Eric Dumazet says:

====================
inet: make exception handling less predictible

This second round of patches is addressing Keyu Man recommendations
to make linux hosts more robust against a class of brute force attacks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

63cad4c7

ipv4: make exception cache less predictible · 67d6d681

Eric Dumazet authored Aug 29, 2021

Even after commit 6457378f ("ipv4: use siphash instead of Jenkins in
fnhe_hashfun()"), an attacker can still use brute force to learn
some secrets from a victim linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
by 50% in average, we do not expect this to be a problem.

This patch is more complex than the prior one (IPv6 equivalent),
because IPv4 was reusing the oldest entry.
Since we need to be able to evict more than one entry per
update_or_create_fnhe() call, I had to replace
fnhe_oldest() with fnhe_remove_oldest().

Also note that we will queue extra kfree_rcu() calls under stress,
which hopefully wont be a too big issue.

Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

67d6d681

ipv6: make exception cache less predictible · a00df2ca

Eric Dumazet authored Aug 29, 2021

Even after commit 4785305c ("ipv6: use siphash in rt6_exception_hash()"),
an attacker can still use brute force to learn some secrets from a victim
linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
we do not expect this to be a problem.

Following patch is dealing with the same issue in IPv4.

Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a00df2ca

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9dfa859d

David S. Miller authored Aug 30, 2021

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Clean up and consolidate ct ecache infrastructure by merging ct and
   expect notifiers, from Florian Westphal.

2) Missing counters and timestamp in nfnetlink_queue and _log conntrack
   information.

3) Missing error check for xt_register_template() in iptables mangle,
   as a incremental fix for the previous pull request, also from
   Florian Westphal.

4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from
   Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl
   to make sure existing netfilter rulesets do not break. There is
   a static key to disable the hooks by default.

   The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable
   impact in the seg6_input path for non-netfilter users: similar
   numbers with and without this patch.

   This is a sample of the perf report output:

    11.67%  kpktgend_0       [ipv6]                    [k] ipv6_get_saddr_eval
     7.89%  kpktgend_0       [ipv6]                    [k] __ipv6_addr_label
     7.52%  kpktgend_0       [ipv6]                    [k] __ipv6_dev_get_saddr
     6.63%  kpktgend_0       [kernel.vmlinux]          [k] asm_exc_nmi
     4.74%  kpktgend_0       [ipv6]                    [k] fib6_node_lookup_1
     3.48%  kpktgend_0       [kernel.vmlinux]          [k] pskb_expand_head
     3.33%  kpktgend_0       [ipv6]                    [k] ip6_rcv_core.isra.29
     3.33%  kpktgend_0       [ipv6]                    [k] seg6_do_srh_encap
     2.53%  kpktgend_0       [ipv6]                    [k] ipv6_dev_get_saddr
     2.45%  kpktgend_0       [ipv6]                    [k] fib6_table_lookup
     2.24%  kpktgend_0       [kernel.vmlinux]          [k] ___cache_free
     2.16%  kpktgend_0       [ipv6]                    [k] ip6_pol_route
     2.11%  kpktgend_0       [kernel.vmlinux]          [k] __ipv6_addr_type
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9dfa859d

Merge branch 'IXP46x-PTP-Timer' · 724812d8

David S. Miller authored Aug 30, 2021

Linus Walleij says:

====================
IXP46x PTP Timer clean-up and DT

ChangeLog v2->v3:

- Dropped the patch enabling compile tests: we are still dependent
  on some machine-specific headers. The plan is to get rid of this
  after device tree conversion. We include one of the compile testing
  fixes anyway, because it is nice to have fixed.

- Rebased on the latest net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

724812d8

ixp4xx_eth: Probe the PTP module from the device tree · e9e50622

Linus Walleij authored Aug 28, 2021

This adds device tree probing support for the PTP module
adjacent to the ethernet module. It is pretty straight
forward, all resources are in the device tree as they
come to the platform device.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9e50622

ixp4xx_eth: Add devicetree bindings · 323fb75d

Linus Walleij authored Aug 28, 2021

This adds device tree bindings for the IXP46x PTP Timer, a companion
to the IXP4xx ethernet in newer platforms.

Cc: devicetree@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

323fb75d

ixp4xx_eth: Stop referring to GPIOs · 13dc9319

Linus Walleij authored Aug 28, 2021

The driver is being passed interrupts, then looking up the
same interrupts as GPIOs a second time to convert them into
interrupts and set properties on them.

This is pointless: the GPIO and irqchip APIs of a GPIO chip
are orthogonal. Just request the interrupts and be done
with it, drop reliance on any GPIO functions or definitions.

Use devres-managed functions and add a small devress quirk
to unregister the clock as well and we can rely on devres
to handle all the resources and cut down a bunch of
boilerplate in the process.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

13dc9319

ixp4xx_eth: fix compile-testing · f52749a2

Arnd Bergmann authored Aug 28, 2021

Change the driver to use portable integer types to avoid warnings
during compile testing, including:

drivers/net/ethernet/xscale/ixp4xx_eth.c:721:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
        memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
                           ^
drivers/net/ethernet/xscale/ixp4xx_eth.c:963:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
                                              &port->desc_tab_phys)))
                                              ^~~~~~~~~~~~~~~~~~~~
include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here
                     dma_addr_t *handle);
                                 ^
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f52749a2

ixp4xx_eth: make ptp support a platform driver · 9055a2f5

Arnd Bergmann authored Aug 28, 2021

After the recent ixp4xx cleanups, the ptp driver has gained a
build failure in some configurations:

drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init':
drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function)

Avoid the last bit of hardcoded constants from platform headers
by turning the ptp driver bit into a platform driver and passing
the IRQ and MMIO address as resources.

This is a bit tricky:

- The interface between the two drivers is now the new
  ixp46x_ptp_find() function, replacing the global
  ixp46x_phc_index variable. The call is done as late
  as possible, in hwtstamp_set(), to ensure that the
  ptp device is fully probed.

- As the ptp driver is now called by the network driver, the
  link dependency is reversed, which in turn requires a small
  Makefile hack

- The GPIO number is still left hardcoded. This is clearly not
  great, but it can be addressed later. Note that commit 98ac0cc2
  ("ARM: ixp4xx: Convert to MULTI_IRQ_HANDLER") changed the
  IRQ number to something meaningless. Passing the correct IRQ
  in a resource fixes this.

- When the PTP driver is disabled, ethtool .get_ts_info()
  now correctly lists only software timestamping regardless
  of the hardware.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Fix a missing include]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9055a2f5

Merge branch 'hns3-cleanups' · 27c77943

David S. Miller authored Aug 30, 2021

Guangbin Huang says:

====================
net: hns3: add some cleanups

This series includes some cleanups for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

27c77943

net: hns3: uniform parameter name of hclge_ptp_clean_tx_hwts() · 52d89333

Hao Chen authored Aug 30, 2021

The parameter name of hclge_ptp_clean_tx_hwts() in declaration is "dev",
but the definition of this function is used the common name "hdev" as
other functions, so modify it.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52d89333

net: hnss3: use max() to simplify code · 38b99e1e

Hao Chen authored Aug 30, 2021

Replace the "? :" statement wich max() to simplify code.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38b99e1e

net: hns3: modify a print format of hns3_dbg_queue_map() · 5aea2da5

Hao Chen authored Aug 30, 2021

The type of tqp_vector->vector_irq is int, so modify its print format
to "%d".
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5aea2da5

net: hns3: refine function hclge_dbg_dump_tm_pri() · 04d96139

Guangbin Huang authored Aug 30, 2021

To improve flexibility, simplicity and maintainability to dump info of
every element of tm priority, add a struct hclge_dbg_item array of tm
priority and fill string of every data according to this array.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04d96139

net: hns3: reconstruct function hclge_ets_validate() · 161ad669

Guangbin Huang authored Aug 30, 2021

This patch reconstructs function hclge_ets_validate() to reduce the code
cycle complexity and make code more concise.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

161ad669

net: hns3: reconstruct function hns3_self_test · 4c8dab1c

Peng Li authored Aug 30, 2021

This patch reconstructs function hns3_self_test to reduce the code
cycle complexity and make code more concise.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c8dab1c

net: hns3: initialize each member of structure array on a separate line · 60fe9ff9

Jiaran Zhang authored Aug 30, 2021

To make the format of each member initialization of structure array
clearer, initialize each member on a separate line.
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60fe9ff9

Merge branch 'bnxt_en-fw-messages' · 49f9df5b

David S. Miller authored Aug 30, 2021

Michael Chan says:

====================
bnxt_en: Implement new driver APIs to send FW messages

The current driver APIs to send messages to the firmware allow only one
outstanding message in flight.  There is only one buffer for the firmware
response for each firmware channel.  To send a firmware message, all
callers must take a mutex and it is released after the firmware response
has been read.  This scheme does not allow multiple firmware messages
in flight.  Firmware may take a long time to respond to some messages
(e.g. NVRAM related ones) and this causes the mutex to be held for
a long time, blocking other callers.

This patchset intoduces the new driver APIs to address the above
shortcomings.  The new APIs are compatible with new and old firmware.
But the new deferred firmware response mechanism will require newer
firmware in order to allow multiple outstanding firmware commands.

All callers are updated to use the new APIs.

v2: Patch 4 and patch 9 updated to fix issues reported by test robot
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

49f9df5b

bnxt_en: support multiple HWRM commands in flight · 68f684e2

Edwin Peer authored Aug 29, 2021

Add infrastructure to maintain a pending list of HWRM commands awaiting
completion and reduce the scope of the hwrm_cmd_lock mutex so that it
protects only the request mailbox. The mailbox is free to use for one
or more concurrent commands after receiving deferred response events.

For uniformity and completeness, use the same pending list for
collecting completions for commands that respond via a completion ring.
These commands are only used for freeing rings and for IRQ test and
we only support one such command in flight.

Note deferred responses are also only supported on the main channel.
The secondary channel (KONG) does not support deferred responses.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68f684e2

bnxt_en: remove legacy HWRM interface · b34695a8

Edwin Peer authored Aug 29, 2021

There are no longer any callers relying on the old API.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b34695a8

bnxt_en: update all firmware calls to use the new APIs · bbf33d1d

Edwin Peer authored Aug 29, 2021

The conversion follows this general pattern for most of the calls:

1. The input message is changed from a stack variable initialized
using bnxt_hwrm_cmd_hdr_init() to a pointer allocated and intialized
using hwrm_req_init().

2. If we don't need to read the firmware response, the hwrm_send_message()
call is replaced with hwrm_req_send().

3. If we need to read the firmware response, the mutex lock is replaced
by hwrm_req_hold() to hold the response.  When the response is read, the
mutex unlock is replaced by hwrm_req_drop().

If additional DMA buffers are needed for firmware response data, the
hwrm_req_dma_slice() is used instead of calling dma_alloc_coherent().

Some minor refactoring is also done while doing these conversions.

v2: Fix unintialized variable warnings in __bnxt_hwrm_get_tx_rings()
and bnxt_approve_mac()
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbf33d1d

bnxt_en: use link_lock instead of hwrm_cmd_lock to protect link_info · 3c10ed49

Edwin Peer authored Aug 29, 2021

We currently use the hwrm_cmd_lock to serialize the update of the
firmware's link status response data and the copying of link status data
to the VF. This won't work when we update the firmware message APIs, so
we use the link_lock mutex instead. All link_info data should be
updated under the link_lock mutex. Also add link_lock to functions that
touch link_info in __bnxt_open_nic() and bnxt_probe_phy(). The locking
is probably not strictly necessary during probe, but it's more consistent.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c10ed49

bnxt_en: add support for HWRM request slices · 21380817

Edwin Peer authored Aug 29, 2021

Slices are a mechanism for suballocating DMA mapped regions from the
request buffer. Such regions can be used for indirect command data
instead of creating new mappings with dma_alloc_coherent().

The advantage of using a slice is that the lifetime of the slice is
bound to the request and will be automatically unmapped when the
request is consumed.

A single external region is also supported. This allows for regions
that will not fit inside the spare request buffer space such that
the same API can be used consistently even for larger mappings.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21380817

bnxt_en: add HWRM request assignment API · ecddc29d

Edwin Peer authored Aug 29, 2021

hwrm_req_replace() provides an assignment like operation to replace a
managed HWRM request object with data from a pre-built source. This is
useful for handling request data provided by higher layer HWRM clients.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecddc29d

bnxt_en: discard out of sequence HWRM responses · 02b9aa10

Edwin Peer authored Aug 29, 2021

During firmware crash recovery, it is possible for firmware to respond
to stale HWRM commands that have already timed out. Because response
buffers may be reused, any out of sequence responses need to be ignored
and only the matching seq_id should be accepted.

Also, READ_ONCE should be used for the reads from the DMA buffer to
ensure that the necessary loads are scheduled.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02b9aa10

bnxt_en: introduce new firmware message API based on DMA pools · f9ff5782

Edwin Peer authored Aug 29, 2021

This change constitutes a major step towards supporting multiple
firmware commands in flight by maintaining a separate response buffer
for the duration of each request. These firmware commands are also
known as Hardware Resource Manager (HWRM) commands.  Using separate
response buffers requires an API change in order for callers to be
able to free the buffer when done.

It is impossible to keep the existing APIs unchanged.  The existing
usage for a simple HWRM message request such as the following:

        struct input req = {0};
        bnxt_hwrm_cmd_hdr_init(bp, &req, REQ_TYPE, -1, -1);
        rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
        if (rc)
                /* error */

changes to:

         struct input *req;
         rc = hwrm_req_init(bp, req, REQ_TYPE);
         if (rc)
                 /* error */
         rc = hwrm_req_send(bp, req); /* consumes req */
         if (rc)
                 /* error */

The key changes are:

1. The req is no longer allocated on the stack.
2. The caller must call hwrm_req_init() to allocate a req buffer and
   check for a valid buffer.
3. The req buffer is automatically released when hwrm_req_send() returns.
4. If the caller wants to check the firmware response, the caller must
   call hwrm_req_hold() to take ownership of the response buffer and
   release it afterwards using hwrm_req_drop().  The caller is no longer
   required to explicitly hold the hwrm_cmd_lock mutex to read the
   response.
5. Because the firmware commands and responses all have different sizes,
   some safeguards are added to the code.

This patch maintains legacy API compatibiltiy, implementing the old
API in terms of the new.  The follow-on patches will convert all
callers to use the new APIs.

v2: Fix redefined writeq with parisc .config
    Fix "cast from pointer to integer of different size" warning in
hwrm_calc_sentinel()
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9ff5782

bnxt_en: move HWRM API implementation into separate file · 3c8c20db

Edwin Peer authored Aug 29, 2021

Move all firmware messaging functions and definitions to new
bnxt_hwrm.[ch].  The follow-on patches will make major modifications
to these APIs.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c8c20db

bnxt_en: Refactor the HWRM_VER_GET firmware calls · 7b370ad7

Edwin Peer authored Aug 29, 2021

Refactor the code so that __bnxt_hwrm_ver_get() does not call
bnxt_hwrm_do_send_msg() directly.  The new APIs will not expose this
internal call.  Add a new bnxt_hwrm_poll() to poll the HWRM_VER_GET
firmware call silently.  The other bnxt_hwrm_ver_get() function will
send the HWRM_VER_GET message directly with error logs enabled.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b370ad7