1. 09 Mar, 2022 4 commits
  2. 08 Mar, 2022 8 commits
  3. 07 Mar, 2022 6 commits
  4. 06 Mar, 2022 1 commit
  5. 05 Mar, 2022 2 commits
  6. 04 Mar, 2022 3 commits
    • Tung Nguyen's avatar
      tipc: fix kernel panic when enabling bearer · be4977b8
      Tung Nguyen authored
      When enabling a bearer on a node, a kernel panic is observed:
      
      [    4.498085] RIP: 0010:tipc_mon_prep+0x4e/0x130 [tipc]
      ...
      [    4.520030] Call Trace:
      [    4.520689]  <IRQ>
      [    4.521236]  tipc_link_build_proto_msg+0x375/0x750 [tipc]
      [    4.522654]  tipc_link_build_state_msg+0x48/0xc0 [tipc]
      [    4.524034]  __tipc_node_link_up+0xd7/0x290 [tipc]
      [    4.525292]  tipc_rcv+0x5da/0x730 [tipc]
      [    4.526346]  ? __netif_receive_skb_core+0xb7/0xfc0
      [    4.527601]  tipc_l2_rcv_msg+0x5e/0x90 [tipc]
      [    4.528737]  __netif_receive_skb_list_core+0x20b/0x260
      [    4.530068]  netif_receive_skb_list_internal+0x1bf/0x2e0
      [    4.531450]  ? dev_gro_receive+0x4c2/0x680
      [    4.532512]  napi_complete_done+0x6f/0x180
      [    4.533570]  virtnet_poll+0x29c/0x42e [virtio_net]
      ...
      
      The node in question is receiving activate messages in another
      thread after changing bearer status to allow message sending/
      receiving in current thread:
      
               thread 1           |              thread 2
               --------           |              --------
                                  |
      tipc_enable_bearer()        |
        test_and_set_bit_lock()   |
          tipc_bearer_xmit_skb()  |
                                  | tipc_l2_rcv_msg()
                                  |   tipc_rcv()
                                  |     __tipc_node_link_up()
                                  |       tipc_link_build_state_msg()
                                  |         tipc_link_build_proto_msg()
                                  |           tipc_mon_prep()
                                  |           {
                                  |             ...
                                  |             // null-pointer dereference
                                  |             u16 gen = mon->dom_gen;
                                  |             ...
                                  |           }
        // Not being executed yet |
        tipc_mon_create()         |
        {                         |
          ...                     |
          // allocate             |
          mon = kzalloc();        |
          ...                     |
        }                         |
      
      Monitoring pointer in thread 2 is dereferenced before monitoring data
      is allocated in thread 1. This causes kernel panic.
      
      This commit fixes it by allocating the monitoring data before enabling
      the bearer to receive messages.
      
      Fixes: 35c55c98 ("tipc: add neighbor monitoring framework")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be4977b8
    • Robert Hancock's avatar
      net: macb: Fix lost RX packet wakeup race in NAPI receive · 0bf476fc
      Robert Hancock authored
      There is an oddity in the way the RSR register flags propagate to the
      ISR register (and the actual interrupt output) on this hardware: it
      appears that RSR register bits only result in ISR being asserted if the
      interrupt was actually enabled at the time, so enabling interrupts with
      RSR bits already set doesn't trigger an interrupt to be raised. There
      was already a partial fix for this race in the macb_poll function where
      it checked for RSR bits being set and re-triggered NAPI receive.
      However, there was a still a race window between checking RSR and
      actually enabling interrupts, where a lost wakeup could happen. It's
      necessary to check again after enabling interrupts to see if RSR was set
      just prior to the interrupt being enabled, and re-trigger receive in that
      case.
      
      This issue was noticed in a point-to-point UDP request-response protocol
      which periodically saw timeouts or abnormally high response times due to
      received packets not being processed in a timely fashion. In many
      applications, more packets arriving, including TCP retransmissions, would
      cause the original packet to be processed, thus masking the issue.
      
      Fixes: 02f7a34f ("net: macb: Re-enable RX interrupt only when RX is done")
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarScott McNutt <scott.mcnutt@siriusxm.com>
      Signed-off-by: default avatarScott McNutt <scott.mcnutt@siriusxm.com>
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Tested-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bf476fc
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 9f3956d6
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression with processing of MGMT commands
       - Fix unbalanced unlock in Set Device Flags
      
      * tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work
        Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()
      ====================
      
      Link: https://lore.kernel.org/r/20220303210743.314679-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f3956d6
  7. 03 Mar, 2022 16 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b949c21f
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
      
        Lots of various size fixes, the length of the tag speaks for itself.
        Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
        which had been lagging as you pointed out previously. But there's also
        a larger than we'd like portion of fixes for bugs from previous
        releases.
      
        Three more fixes still under discussion, including and xfrm revert for
        uAPI error.
      
        Current release - regressions:
      
         - iwlwifi: don't advertise TWT support, prevent FW crash
      
         - xfrm: fix the if_id check in changelink
      
         - xen/netfront: destroy queues before real_num_tx_queues is zeroed
      
         - bluetooth: fix not checking MGMT cmd pending queue, make scanning
           work again
      
        Current release - new code bugs:
      
         - mptcp: make SIOCOUTQ accurate for fallback socket
      
         - bluetooth: access skb->len after null check
      
         - bluetooth: hci_sync: fix not using conn_timeout
      
         - smc: fix cleanup when register ULP fails
      
         - dsa: restore error path of dsa_tree_change_tag_proto
      
         - iwlwifi: fix build error for IWLMEI
      
         - iwlwifi: mvm: propagate error from request_ownership to the user
      
        Previous releases - regressions:
      
         - xfrm: fix pMTU regression when reported pMTU is too small
      
         - xfrm: fix TCP MSS calculation when pMTU is close to 1280
      
         - bluetooth: fix bt_skb_sendmmsg not allocating partial chunks
      
         - ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks
      
         - ipv6: prevent leaks in igmp6 when input queues get full
      
         - fix up skbs delta_truesize in UDP GRO frag_list
      
         - eth: e1000e: fix possible HW unit hang after an s0ix exit
      
         - eth: e1000e: correct NVM checksum verification flow
      
         - ptp: ocp: fix large time adjustments
      
        Previous releases - always broken:
      
         - tcp: make tcp_read_sock() more robust in presence of urgent data
      
         - xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate
      
         - xfrm: fix xfrm_migrate issues when address family changes
      
         - dcb: flush lingering app table entries for unregistered devices
      
         - smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
      
         - mac80211: fix EAPoL rekey fail in 802.3 rx path
      
         - mac80211: fix forwarded mesh frames AC & queue selection
      
         - netfilter: nf_queue: fix socket access races and bugs
      
         - batman-adv: fix ToCToU iflink problems and check the result belongs
           to the expected net namespace
      
         - can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting
      
         - can: rcar_canfd: register the CAN device when fully ready
      
         - eth: igb, igc: phy: drop premature return leaking HW semaphore
      
         - eth: ixgbe: xsk: change !netif_carrier_ok() handling in
           ixgbe_xmit_zc(), prevent live lock when link goes down
      
         - eth: stmmac: only enable DMA interrupts when ready
      
         - eth: sparx5: move vlan checks before any changes are made
      
         - eth: iavf: fix races around init, removal, resets and vlan ops
      
         - ibmvnic: more reset flow fixes
      
        Misc:
      
         - eth: fix return value of __setup handlers"
      
      * tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
        ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
        net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
        ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
        selftests: mlxsw: resource_scale: Fix return value
        selftests: mlxsw: tc_police_scale: Make test more robust
        net: dcb: disable softirqs in dcbnl_flush_dev()
        bnx2: Fix an error message
        sfc: extend the locking on mcdi->seqno
        net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
        net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
        net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
        tcp: make tcp_read_sock() more robust
        bpf, sockmap: Do not ignore orig_len parameter
        net: ipa: add an interconnect dependency
        net: fix up skbs delta_truesize in UDP GRO frag_list
        iwlwifi: mvm: return value for request_ownership
        nl80211: Update bss channel on channel switch for P2P_CLIENT
        iwlwifi: fix build error for IWLMEI
        ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
        batman-adv: Don't expect inter-netns unique iflink indices
        ...
      b949c21f
    • Linus Torvalds's avatar
      Merge tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · e58bd49d
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - Fix memory detection for MT7621 devices
      
       - Fix setnocoherentio kernel option
      
       - Fix warning when CONFIG_SCHED_CORE is enabled
      
      * tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: ralink: mt7621: use bitwise NOT instead of logical
        mips: setup: fix setnocoherentio() boolean setting
        MIPS: smp: fill in sibling and core maps earlier
        MIPS: ralink: mt7621: do memory detection on KSEG1
      e58bd49d
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux · 4d5ae234
      Linus Torvalds authored
      Pull auxdisplay fixes from Miguel Ojeda:
       "A few lcd2s fixes from Andy Shevchenko"
      
      * tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux:
        auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
        auxdisplay: lcd2s: Fix memory leak in ->remove()
        auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
      4d5ae234
    • Eric Dumazet's avatar
      ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() · 2d3916f3
      Eric Dumazet authored
      While investigating on why a synchronize_net() has been added recently
      in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
      might drop skbs in some cases.
      
      Discussion about removing synchronize_net() from ipv6_mc_down()
      will happen in a different thread.
      
      Fixes: f185de28 ("mld: add new workqueues for process mld events")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d3916f3
    • Vladimir Oltean's avatar
      net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change · e1bec7fa
      Vladimir Oltean authored
      The blamed commit said one thing but did another. It explains that we
      should restore the "return err" to the original "goto out_unwind_tagger",
      but instead it replaced it with "goto out_unlock".
      
      When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
      multi-switch tree, the switches would end up not using the same tagging
      protocol.
      
      Fixes: 0b0e2ff1 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1bec7fa
    • Maciej Fijalkowski's avatar
      ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() · 6c7273a2
      Maciej Fijalkowski authored
      Commit c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if
      netif is not OK") addressed the ring transient state when
      MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
      interface to through down/up. Maurice reported that when carrier is not
      ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
      cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
      there is still some work to be done.
      
      To fix this, do not set work_done to false for a !netif_carrier_ok().
      
      Fixes: c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
      Reported-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Tested-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c7273a2
    • Jakub Kicinski's avatar
      Merge branch 'selftests-mlxsw-a-couple-of-fixes' · 312f2d50
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      selftests: mlxsw: A couple of fixes
      
      Patch #1 fixes a breakage due to a change in iproute2 output. The real
      problem is not iproute2, but the fact that the check was not strict
      enough. Fixed by using JSON output instead. Targeting at net so that the
      test will pass as part of old and new kernels regardless of iproute2
      version.
      
      Patch #2 fixes an issue uncovered by the first one.
      ====================
      
      Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      312f2d50
    • Amit Cohen's avatar
      selftests: mlxsw: resource_scale: Fix return value · 196f9bc0
      Amit Cohen authored
      The test runs several test cases and is supposed to return an error in
      case at least one of them failed.
      
      Currently, the check of the return value of each test case is in the
      wrong place, which can result in the wrong return value. For example:
      
       # TESTS='tc_police' ./resource_scale.sh
       TEST: 'tc_police' [default] 968                                     [FAIL]
               tc police offload count failed
       Error: mlxsw_spectrum: Failed to allocate policer index.
       We have an error talking to the kernel
       Command failed /tmp/tmp.i7Oc5HwmXY:969
       TEST: 'tc_police' [default] overflow 969                            [ OK ]
       ...
       TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]
      
       $ echo $?
       0
      
      Fix this by moving the check to be done after each test case.
      
      Fixes: 059b18e2 ("selftests: mlxsw: Return correct error code in resource scale test")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      196f9bc0
    • Amit Cohen's avatar
      selftests: mlxsw: tc_police_scale: Make test more robust · dc975207
      Amit Cohen authored
      The test adds tc filters and checks how many of them were offloaded by
      grepping for 'in_hw'.
      
      iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
      action offload") added offload indication to tc actions, producing the
      following output:
      
       $ tc filter show dev swp2 ingress
       ...
       filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
         eth_type ipv6
         dst_ip 2001:db8:1::7bf
         skip_sw
         in_hw in_hw_count 1
               action order 1:  police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
               ref 1 bind 1
               not_in_hw
               used_hw_stats immediate
      
      The current grep expression matches on both 'in_hw' and 'not_in_hw',
      resulting in incorrect results.
      
      Fix that by using JSON output instead.
      
      Fixes: 5061e773 ("selftests: mlxsw: Add scale test for tc-police")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dc975207
    • Vladimir Oltean's avatar
      net: dcb: disable softirqs in dcbnl_flush_dev() · 10b6bb62
      Vladimir Oltean authored
      Ido Schimmel points out that since commit 52cff74e ("dcbnl : Disable
      software interrupts before taking dcb_lock"), the DCB API can be called
      by drivers from softirq context.
      
      One such in-tree example is the chelsio cxgb4 driver:
      dcb_rpl
      -> cxgb4_dcb_handle_fw_update
         -> dcb_ieee_setapp
      
      If the firmware for this driver happened to send an event which resulted
      in a call to dcb_ieee_setapp() at the exact same time as another
      DCB-enabled interface was unregistering on the same CPU, the softirq
      would deadlock, because the interrupted process was already holding the
      dcb_lock in dcbnl_flush_dev().
      
      Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
      in the rest of the dcbnl code.
      
      Fixes: 91b0383f ("net: dcb: flush lingering app table entries for unregistered devices")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      10b6bb62
    • Christophe JAILLET's avatar
      bnx2: Fix an error message · 8ccffe9a
      Christophe JAILLET authored
      Fix an error message and report the correct failing function.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ccffe9a
    • Niels Dossche's avatar
      sfc: extend the locking on mcdi->seqno · f1fb205e
      Niels Dossche authored
      seqno could be read as a stale value outside of the lock. The lock is
      already acquired to protect the modification of seqno against a possible
      race condition. Place the reading of this value also inside this locking
      to protect it against a possible race condition.
      Signed-off-by: default avatarNiels Dossche <dossche.niels@gmail.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1fb205e
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work · 008ee9eb
      Luiz Augusto von Dentz authored
      hci_cmd_sync_queue can be called multiple times, each adding a
      hci_cmd_sync_work_entry, before hci_cmd_sync_work is run so this makes
      sure they are all dequeued properly otherwise it creates a backlog of
      entries that are never run.
      
      Link: https://lore.kernel.org/all/CAJCQCtSeUtHCgsHXLGrSTWKmyjaQDbDNpP4rb0i+RE+L2FTXSA@mail.gmail.com/T/
      Fixes: 6a98e383 ("Bluetooth: Add helper for serialized HCI command execution")
      Tested-by: default avatarChris Clayton <chris2553@googlemail.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      008ee9eb
    • Hans de Goede's avatar
      Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags() · 815d5121
      Hans de Goede authored
      There is only one "goto done;" in set_device_flags() and this happens
      *before* hci_dev_lock() is called, move the done label to after the
      hci_dev_unlock() to fix the following unlock balance:
      
      [   31.493567] =====================================
      [   31.493571] WARNING: bad unlock balance detected!
      [   31.493576] 5.17.0-rc2+ #13 Tainted: G         C  E
      [   31.493581] -------------------------------------
      [   31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at:
      [   31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth]
      [   31.493684] but there are no more locks to release!
      
      Note this bug has been around for a couple of years, but before
      commit fe92ee64 ("Bluetooth: hci_core: Rework hci_conn_params flags")
      supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so
      the check for unsupported flags which does the "goto done;" never
      triggered.
      
      Fixes: fe92ee64 ("Bluetooth: hci_core: Rework hci_conn_params flags")
      Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      815d5121
    • David S. Miller's avatar
      Merge branch 'smc-fix' · f8e9bd34
      David S. Miller authored
      D. Wythe says:
      
      ====================
      fix unexpected SMC_CLC_DECL_ERR_REGRMB error
      
      We can easily trigger the SMC_CLC_DECL_ERR_REGRMB exception within
      following script:
      
      server: smc_run nginx
      client: smc_run  ./wrk -c 2000 -t 8 -d 20 http://smc-server
      
      And we can clearly see that this error is also divided into two types:
      
      1. 0x09990003
      2. 0x05000000/0x09990003
      
      Which has the same root causes, but the immediate causes vary.
      
      The root cause of this issues is that remove connections from link group
      is not synchronous with add/delete rtoken entry,  which means that even
      the number of connections is less that SMC_RMBS_PER_LGR_MAX, it does not
      mean that the connection can register rtoken successfully later. In
      other words, the rtoken entry may released, This will cause an
      unexpected SMC_CLC_DECL_ERR_REGRMB to be reported, and then this SMC
      connections have to fallback to TCP.
      
      This patch set handles two types of SMC_CLC_DECL_ERR_REGRMB exceptions
      from different perspectives.
      
      Patch 1: fix the 0x05000000/0x09990003 error.
      Patch 2: fix the 0x09990003 error.
      
      After those patches, there is no SMC_CLC_DECL_ERR_REGRMB exceptions in
      my
      test case any more.
      
      v1 -> v2:
      - add bugfix patch for SMC_CLC_DECL_ERR_REGRMB cause by server side
      v2 -> v3:
      - fix incorrect mail thread
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8e9bd34
    • D. Wythe's avatar
      net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server · 4940a1fd
      D. Wythe authored
      The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
      Based on the fact that whether a new SMC connection can be accepted or
      not depends on not only the limit of conn nums, but also the available
      entries of rtoken. Since the rtoken release is trigger by peer, while
      the conn nums is decrease by local, tons of thing can happen in this
      time difference.
      
      This only thing that needs to be mentioned is that now all connection
      creations are completely protected by smc_server_lgr_pending lock, it's
      enough to check only the available entries in rtokens_used_mask.
      
      Fixes: cd6851f3 ("smc: remote memory buffers (RMBs)")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4940a1fd