1. 30 Aug, 2021 37 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 19a31d79
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf-next 2021-08-31
      
      We've added 116 non-merge commits during the last 17 day(s) which contain
      a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
      
      The main changes are:
      
      1) Add opaque bpf_cookie to perf link which the program can read out again,
         to be used in libbpf-based USDT library, from Andrii Nakryiko.
      
      2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
      
      3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
      
      4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
         to another congestion control algorithm during init, from Martin KaFai Lau.
      
      5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
      
      6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
      
      7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
         progs, from Xu Liu and Stanislav Fomichev.
      
      8) Support for __weak typed ksyms in libbpf, from Hao Luo.
      
      9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
      
      10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
      
      11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
      
      12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
          Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
      
      13) Another big batch to revamp XDP samples in order to give them consistent look
          and feel, from Kumar Kartikeya Dwivedi.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
        MAINTAINERS: Remove self from powerpc BPF JIT
        selftests/bpf: Fix potential unreleased lock
        samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
        selftests/bpf: Reduce more flakyness in sockmap_listen
        bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
        bpf: selftests: Add dctcp fallback test
        bpf: selftests: Add connect_to_fd_opts to network_helpers
        bpf: selftests: Add sk_state to bpf_tcp_helpers.h
        bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
        selftests: xsk: Preface options with opt
        selftests: xsk: Make enums lower case
        selftests: xsk: Generate packets from specification
        selftests: xsk: Generate packet directly in umem
        selftests: xsk: Simplify cleanup of ifobjects
        selftests: xsk: Decrease sending speed
        selftests: xsk: Validate tx stats on tx thread
        selftests: xsk: Simplify packet validation in xsk tests
        selftests: xsk: Rename worker_* functions that are not thread entry points
        selftests: xsk: Disassociate umem size with packets sent
        selftests: xsk: Remove end-of-test packet
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19a31d79
    • Maxim Mikityanskiy's avatar
      sch_htb: Fix inconsistency when leaf qdisc creation fails · ca49bfd9
      Maxim Mikityanskiy authored
      In HTB offload mode, qdiscs of leaf classes are grafted to netdev
      queues. sch_htb expects the dev_queue field of these qdiscs to point to
      the corresponding queues. However, qdisc creation may fail, and in that
      case noop_qdisc is used instead. Its dev_queue doesn't point to the
      right queue, so sch_htb can lose track of used netdev queues, which will
      cause internal inconsistencies.
      
      This commit fixes this bug by keeping track of the netdev queue inside
      struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
      new field, the two values are synced on writes, and WARNs are added to
      assert equality of the two values.
      
      The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
      the driver used to pass the old and new queue IDs to sch_htb. Now that
      there is a new field (offload_queue) in struct htb_class that needs to
      be updated on this operation, the driver will pass the old class ID to
      sch_htb instead (it already knows the new class ID).
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca49bfd9
    • Sandipan Das's avatar
      MAINTAINERS: Remove self from powerpc BPF JIT · fca35b11
      Sandipan Das authored
      Stepping down as I haven't had a chance to look into the powerpc
      BPF JIT compilers for a while.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210827111905.396145-1-sandipan@linux.ibm.com
      fca35b11
    • Yajun Deng's avatar
      net: ipv4: Fix the warning for dereference · 1b9fbe81
      Yajun Deng authored
      Add a if statements to avoid the warning.
      
      Dan Carpenter report:
      The patch faf482ca: "net: ipv4: Move ip_options_fragment() out of
      loop" from Aug 23, 2021, leads to the following Smatch complaint:
      
          net/ipv4/ip_output.c:833 ip_do_fragment()
          warn: variable dereferenced before check 'iter.frag' (see line 828)
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: faf482ca ("net: ipv4: Move ip_options_fragment() out of loop")
      Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#tSigned-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b9fbe81
    • Dan Carpenter's avatar
      net: qrtr: make checks in qrtr_endpoint_post() stricter · aaa8e492
      Dan Carpenter authored
      These checks are still not strict enough.  The main problem is that if
      "cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
      guaranteed to be 4 but we need to be at least 16 bytes.  In fact, we
      can reject everything smaller than sizeof(*pkt) which is 20 bytes.
      
      Also I don't like the ALIGN(size, 4).  It's better to just insist that
      data is needs to be aligned at the start.
      
      Fixes: 0baa99ee ("net: qrtr: Allow non-immediate node routing")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaa8e492
    • Haimin Zhang's avatar
      fix array-index-out-of-bounds in taprio_change · efe487fc
      Haimin Zhang authored
      syzbot report an array-index-out-of-bounds in taprio_change
      index 16 is out of range for type '__u16 [16]'
      that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check
      the return value of netdev_set_num_tc.
      
      Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com
      Signed-off-by: default avatarHaimin Zhang <tcs_kernel@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe487fc
    • 王贇's avatar
      net: fix NULL pointer reference in cipso_v4_doi_free · e842cb60
      王贇 authored
      In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
      failed, we sometime observe panic:
      
        BUG: kernel NULL pointer dereference, address:
        ...
        RIP: 0010:cipso_v4_doi_free+0x3a/0x80
        ...
        Call Trace:
         netlbl_cipsov4_add_std+0xf4/0x8c0
         netlbl_cipsov4_add+0x13f/0x1b0
         genl_family_rcv_msg_doit.isra.15+0x132/0x170
         genl_rcv_msg+0x125/0x240
      
      This is because in cipso_v4_doi_free() there is no check
      on 'doi_def->map.std' when doi_def->type got value 1, which
      is possibe, since netlbl_cipsov4_add_std() haven't initialize
      it before alloc 'doi_def->map.std'.
      
      This patch just add the check to prevent panic happen in similar
      cases.
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e842cb60
    • David S. Miller's avatar
      Merge branch 'inet-exceptions-less-predictable' · 63cad4c7
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: make exception handling less predictible
      
      This second round of patches is addressing Keyu Man recommendations
      to make linux hosts more robust against a class of brute force attacks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63cad4c7
    • Eric Dumazet's avatar
      ipv4: make exception cache less predictible · 67d6d681
      Eric Dumazet authored
      Even after commit 6457378f ("ipv4: use siphash instead of Jenkins in
      fnhe_hashfun()"), an attacker can still use brute force to learn
      some secrets from a victim linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      by 50% in average, we do not expect this to be a problem.
      
      This patch is more complex than the prior one (IPv6 equivalent),
      because IPv4 was reusing the oldest entry.
      Since we need to be able to evict more than one entry per
      update_or_create_fnhe() call, I had to replace
      fnhe_oldest() with fnhe_remove_oldest().
      
      Also note that we will queue extra kfree_rcu() calls under stress,
      which hopefully wont be a too big issue.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67d6d681
    • Eric Dumazet's avatar
      ipv6: make exception cache less predictible · a00df2ca
      Eric Dumazet authored
      Even after commit 4785305c ("ipv6: use siphash in rt6_exception_hash()"),
      an attacker can still use brute force to learn some secrets from a victim
      linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      we do not expect this to be a problem.
      
      Following patch is dealing with the same issue in IPv4.
      
      Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a00df2ca
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9dfa859d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Clean up and consolidate ct ecache infrastructure by merging ct and
         expect notifiers, from Florian Westphal.
      
      2) Missing counters and timestamp in nfnetlink_queue and _log conntrack
         information.
      
      3) Missing error check for xt_register_template() in iptables mangle,
         as a incremental fix for the previous pull request, also from
         Florian Westphal.
      
      4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from
         Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl
         to make sure existing netfilter rulesets do not break. There is
         a static key to disable the hooks by default.
      
         The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable
         impact in the seg6_input path for non-netfilter users: similar
         numbers with and without this patch.
      
         This is a sample of the perf report output:
      
          11.67%  kpktgend_0       [ipv6]                    [k] ipv6_get_saddr_eval
           7.89%  kpktgend_0       [ipv6]                    [k] __ipv6_addr_label
           7.52%  kpktgend_0       [ipv6]                    [k] __ipv6_dev_get_saddr
           6.63%  kpktgend_0       [kernel.vmlinux]          [k] asm_exc_nmi
           4.74%  kpktgend_0       [ipv6]                    [k] fib6_node_lookup_1
           3.48%  kpktgend_0       [kernel.vmlinux]          [k] pskb_expand_head
           3.33%  kpktgend_0       [ipv6]                    [k] ip6_rcv_core.isra.29
           3.33%  kpktgend_0       [ipv6]                    [k] seg6_do_srh_encap
           2.53%  kpktgend_0       [ipv6]                    [k] ipv6_dev_get_saddr
           2.45%  kpktgend_0       [ipv6]                    [k] fib6_table_lookup
           2.24%  kpktgend_0       [kernel.vmlinux]          [k] ___cache_free
           2.16%  kpktgend_0       [ipv6]                    [k] ip6_pol_route
           2.11%  kpktgend_0       [kernel.vmlinux]          [k] __ipv6_addr_type
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dfa859d
    • David S. Miller's avatar
      Merge branch 'IXP46x-PTP-Timer' · 724812d8
      David S. Miller authored
      Linus Walleij says:
      
      ====================
      IXP46x PTP Timer clean-up and DT
      
      ChangeLog v2->v3:
      
      - Dropped the patch enabling compile tests: we are still dependent
        on some machine-specific headers. The plan is to get rid of this
        after device tree conversion. We include one of the compile testing
        fixes anyway, because it is nice to have fixed.
      
      - Rebased on the latest net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      724812d8
    • Linus Walleij's avatar
      ixp4xx_eth: Probe the PTP module from the device tree · e9e50622
      Linus Walleij authored
      This adds device tree probing support for the PTP module
      adjacent to the ethernet module. It is pretty straight
      forward, all resources are in the device tree as they
      come to the platform device.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9e50622
    • Linus Walleij's avatar
      ixp4xx_eth: Add devicetree bindings · 323fb75d
      Linus Walleij authored
      This adds device tree bindings for the IXP46x PTP Timer, a companion
      to the IXP4xx ethernet in newer platforms.
      
      Cc: devicetree@vger.kernel.org
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323fb75d
    • Linus Walleij's avatar
      ixp4xx_eth: Stop referring to GPIOs · 13dc9319
      Linus Walleij authored
      The driver is being passed interrupts, then looking up the
      same interrupts as GPIOs a second time to convert them into
      interrupts and set properties on them.
      
      This is pointless: the GPIO and irqchip APIs of a GPIO chip
      are orthogonal. Just request the interrupts and be done
      with it, drop reliance on any GPIO functions or definitions.
      
      Use devres-managed functions and add a small devress quirk
      to unregister the clock as well and we can rely on devres
      to handle all the resources and cut down a bunch of
      boilerplate in the process.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13dc9319
    • Arnd Bergmann's avatar
      ixp4xx_eth: fix compile-testing · f52749a2
      Arnd Bergmann authored
      Change the driver to use portable integer types to avoid warnings
      during compile testing, including:
      
      drivers/net/ethernet/xscale/ixp4xx_eth.c:721:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
              memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
                                 ^
      drivers/net/ethernet/xscale/ixp4xx_eth.c:963:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
                                                    &port->desc_tab_phys)))
                                                    ^~~~~~~~~~~~~~~~~~~~
      include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here
                           dma_addr_t *handle);
                                       ^
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f52749a2
    • Arnd Bergmann's avatar
      ixp4xx_eth: make ptp support a platform driver · 9055a2f5
      Arnd Bergmann authored
      After the recent ixp4xx cleanups, the ptp driver has gained a
      build failure in some configurations:
      
      drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init':
      drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function)
      
      Avoid the last bit of hardcoded constants from platform headers
      by turning the ptp driver bit into a platform driver and passing
      the IRQ and MMIO address as resources.
      
      This is a bit tricky:
      
      - The interface between the two drivers is now the new
        ixp46x_ptp_find() function, replacing the global
        ixp46x_phc_index variable. The call is done as late
        as possible, in hwtstamp_set(), to ensure that the
        ptp device is fully probed.
      
      - As the ptp driver is now called by the network driver, the
        link dependency is reversed, which in turn requires a small
        Makefile hack
      
      - The GPIO number is still left hardcoded. This is clearly not
        great, but it can be addressed later. Note that commit 98ac0cc2
        ("ARM: ixp4xx: Convert to MULTI_IRQ_HANDLER") changed the
        IRQ number to something meaningless. Passing the correct IRQ
        in a resource fixes this.
      
      - When the PTP driver is disabled, ethtool .get_ts_info()
        now correctly lists only software timestamping regardless
        of the hardware.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      [Fix a missing include]
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9055a2f5
    • David S. Miller's avatar
      Merge branch 'hns3-cleanups' · 27c77943
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some cleanups
      
      This series includes some cleanups for the HNS3 ethernet driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27c77943
    • Hao Chen's avatar
      net: hns3: uniform parameter name of hclge_ptp_clean_tx_hwts() · 52d89333
      Hao Chen authored
      The parameter name of hclge_ptp_clean_tx_hwts() in declaration is "dev",
      but the definition of this function is used the common name "hdev" as
      other functions, so modify it.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52d89333
    • Hao Chen's avatar
      net: hnss3: use max() to simplify code · 38b99e1e
      Hao Chen authored
      Replace the "? :" statement wich max() to simplify code.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38b99e1e
    • Hao Chen's avatar
      net: hns3: modify a print format of hns3_dbg_queue_map() · 5aea2da5
      Hao Chen authored
      The type of tqp_vector->vector_irq is int, so modify its print format
      to "%d".
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aea2da5
    • Guangbin Huang's avatar
      net: hns3: refine function hclge_dbg_dump_tm_pri() · 04d96139
      Guangbin Huang authored
      To improve flexibility, simplicity and maintainability to dump info of
      every element of tm priority, add a struct hclge_dbg_item array of tm
      priority and fill string of every data according to this array.
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d96139
    • Guangbin Huang's avatar
      net: hns3: reconstruct function hclge_ets_validate() · 161ad669
      Guangbin Huang authored
      This patch reconstructs function hclge_ets_validate() to reduce the code
      cycle complexity and make code more concise.
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161ad669
    • Peng Li's avatar
      net: hns3: reconstruct function hns3_self_test · 4c8dab1c
      Peng Li authored
      This patch reconstructs function hns3_self_test to reduce the code
      cycle complexity and make code more concise.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c8dab1c
    • Jiaran Zhang's avatar
      net: hns3: initialize each member of structure array on a separate line · 60fe9ff9
      Jiaran Zhang authored
      To make the format of each member initialization of structure array
      clearer, initialize each member on a separate line.
      Signed-off-by: default avatarJiaran Zhang <zhangjiaran@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60fe9ff9
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fw-messages' · 49f9df5b
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Implement new driver APIs to send FW messages
      
      The current driver APIs to send messages to the firmware allow only one
      outstanding message in flight.  There is only one buffer for the firmware
      response for each firmware channel.  To send a firmware message, all
      callers must take a mutex and it is released after the firmware response
      has been read.  This scheme does not allow multiple firmware messages
      in flight.  Firmware may take a long time to respond to some messages
      (e.g. NVRAM related ones) and this causes the mutex to be held for
      a long time, blocking other callers.
      
      This patchset intoduces the new driver APIs to address the above
      shortcomings.  The new APIs are compatible with new and old firmware.
      But the new deferred firmware response mechanism will require newer
      firmware in order to allow multiple outstanding firmware commands.
      
      All callers are updated to use the new APIs.
      
      v2: Patch 4 and patch 9 updated to fix issues reported by test robot
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49f9df5b
    • Edwin Peer's avatar
      bnxt_en: support multiple HWRM commands in flight · 68f684e2
      Edwin Peer authored
      Add infrastructure to maintain a pending list of HWRM commands awaiting
      completion and reduce the scope of the hwrm_cmd_lock mutex so that it
      protects only the request mailbox. The mailbox is free to use for one
      or more concurrent commands after receiving deferred response events.
      
      For uniformity and completeness, use the same pending list for
      collecting completions for commands that respond via a completion ring.
      These commands are only used for freeing rings and for IRQ test and
      we only support one such command in flight.
      
      Note deferred responses are also only supported on the main channel.
      The secondary channel (KONG) does not support deferred responses.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68f684e2
    • Edwin Peer's avatar
      bnxt_en: remove legacy HWRM interface · b34695a8
      Edwin Peer authored
      There are no longer any callers relying on the old API.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b34695a8
    • Edwin Peer's avatar
      bnxt_en: update all firmware calls to use the new APIs · bbf33d1d
      Edwin Peer authored
      The conversion follows this general pattern for most of the calls:
      
      1. The input message is changed from a stack variable initialized
      using bnxt_hwrm_cmd_hdr_init() to a pointer allocated and intialized
      using hwrm_req_init().
      
      2. If we don't need to read the firmware response, the hwrm_send_message()
      call is replaced with hwrm_req_send().
      
      3. If we need to read the firmware response, the mutex lock is replaced
      by hwrm_req_hold() to hold the response.  When the response is read, the
      mutex unlock is replaced by hwrm_req_drop().
      
      If additional DMA buffers are needed for firmware response data, the
      hwrm_req_dma_slice() is used instead of calling dma_alloc_coherent().
      
      Some minor refactoring is also done while doing these conversions.
      
      v2: Fix unintialized variable warnings in __bnxt_hwrm_get_tx_rings()
      and bnxt_approve_mac()
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbf33d1d
    • Edwin Peer's avatar
      bnxt_en: use link_lock instead of hwrm_cmd_lock to protect link_info · 3c10ed49
      Edwin Peer authored
      We currently use the hwrm_cmd_lock to serialize the update of the
      firmware's link status response data and the copying of link status data
      to the VF.  This won't work when we update the firmware message APIs, so
      we use the link_lock mutex instead.  All link_info data should be
      updated under the link_lock mutex.  Also add link_lock to functions that
      touch link_info in __bnxt_open_nic() and bnxt_probe_phy(). The locking
      is probably not strictly necessary during probe, but it's more consistent.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c10ed49
    • Edwin Peer's avatar
      bnxt_en: add support for HWRM request slices · 21380817
      Edwin Peer authored
      Slices are a mechanism for suballocating DMA mapped regions from the
      request buffer. Such regions can be used for indirect command data
      instead of creating new mappings with dma_alloc_coherent().
      
      The advantage of using a slice is that the lifetime of the slice is
      bound to the request and will be automatically unmapped when the
      request is consumed.
      
      A single external region is also supported. This allows for regions
      that will not fit inside the spare request buffer space such that
      the same API can be used consistently even for larger mappings.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21380817
    • Edwin Peer's avatar
      bnxt_en: add HWRM request assignment API · ecddc29d
      Edwin Peer authored
      hwrm_req_replace() provides an assignment like operation to replace a
      managed HWRM request object with data from a pre-built source. This is
      useful for handling request data provided by higher layer HWRM clients.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecddc29d
    • Edwin Peer's avatar
      bnxt_en: discard out of sequence HWRM responses · 02b9aa10
      Edwin Peer authored
      During firmware crash recovery, it is possible for firmware to respond
      to stale HWRM commands that have already timed out. Because response
      buffers may be reused, any out of sequence responses need to be ignored
      and only the matching seq_id should be accepted.
      
      Also, READ_ONCE should be used for the reads from the DMA buffer to
      ensure that the necessary loads are scheduled.
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02b9aa10
    • Edwin Peer's avatar
      bnxt_en: introduce new firmware message API based on DMA pools · f9ff5782
      Edwin Peer authored
      This change constitutes a major step towards supporting multiple
      firmware commands in flight by maintaining a separate response buffer
      for the duration of each request. These firmware commands are also
      known as Hardware Resource Manager (HWRM) commands.  Using separate
      response buffers requires an API change in order for callers to be
      able to free the buffer when done.
      
      It is impossible to keep the existing APIs unchanged.  The existing
      usage for a simple HWRM message request such as the following:
      
              struct input req = {0};
              bnxt_hwrm_cmd_hdr_init(bp, &req, REQ_TYPE, -1, -1);
              rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
              if (rc)
                      /* error */
      
      changes to:
      
               struct input *req;
               rc = hwrm_req_init(bp, req, REQ_TYPE);
               if (rc)
                       /* error */
               rc = hwrm_req_send(bp, req); /* consumes req */
               if (rc)
                       /* error */
      
      The key changes are:
      
      1. The req is no longer allocated on the stack.
      2. The caller must call hwrm_req_init() to allocate a req buffer and
         check for a valid buffer.
      3. The req buffer is automatically released when hwrm_req_send() returns.
      4. If the caller wants to check the firmware response, the caller must
         call hwrm_req_hold() to take ownership of the response buffer and
         release it afterwards using hwrm_req_drop().  The caller is no longer
         required to explicitly hold the hwrm_cmd_lock mutex to read the
         response.
      5. Because the firmware commands and responses all have different sizes,
         some safeguards are added to the code.
      
      This patch maintains legacy API compatibiltiy, implementing the old
      API in terms of the new.  The follow-on patches will convert all
      callers to use the new APIs.
      
      v2: Fix redefined writeq with parisc .config
          Fix "cast from pointer to integer of different size" warning in
      hwrm_calc_sentinel()
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9ff5782
    • Edwin Peer's avatar
      bnxt_en: move HWRM API implementation into separate file · 3c8c20db
      Edwin Peer authored
      Move all firmware messaging functions and definitions to new
      bnxt_hwrm.[ch].  The follow-on patches will make major modifications
      to these APIs.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c8c20db
    • Edwin Peer's avatar
      bnxt_en: Refactor the HWRM_VER_GET firmware calls · 7b370ad7
      Edwin Peer authored
      Refactor the code so that __bnxt_hwrm_ver_get() does not call
      bnxt_hwrm_do_send_msg() directly.  The new APIs will not expose this
      internal call.  Add a new bnxt_hwrm_poll() to poll the HWRM_VER_GET
      firmware call silently.  The other bnxt_hwrm_ver_get() function will
      send the HWRM_VER_GET message directly with error logs enabled.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b370ad7
    • Edwin Peer's avatar
      bnxt_en: remove DMA mapping for KONG response · 6c172d59
      Edwin Peer authored
      The additional response buffer serves no useful purpose. There can
      be only one firmware command in flight due to the hwrm_cmd_lock mutex,
      which is taken for the entire duration of any command completion,
      KONG or otherwise. It is thus safe to share a single DMA buffer.
      
      Removing the code associated with the additional mapping will simplify
      matters in the next patch, which allocates response buffers from DMA
      pools on a per request basis.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c172d59
  2. 29 Aug, 2021 3 commits