1. 25 Feb, 2022 18 commits
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Allow queueing resets during probe · fd98693c
      Sukadev Bhattiprolu authored
      We currently don't allow queuing resets when adapter is in VNIC_PROBING
      state - instead we throw away the reset and return EBUSY. The reasoning
      is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
      being initialized so performing a reset during this time can lead us to
      accessing fields in the ibmvnic_adapter that are not fully initialized.
      A review of the code shows that all the adapter state neede to process a
      reset is initialized before registering the CRQ so that should no longer
      be a concern.
      
      Further the expectation is that if we do get a reset (transport event)
      during probe, the do..while() loop in ibmvnic_probe() will handle this
      by reinitializing the CRQ.
      
      While that is true to some extent, it is possible that the reset might
      occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
      but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
      such a reset will be thrown away. While the client assumes that the
      adapter is functional, the vnic server will wait for the client to reinit
      the adapter. This disconnect between the two leaves the adapter down
      needing manual intervention.
      
      Because ibmvnic_probe() has other work to do after initializing the CRQ
      (such as registering the netdev at a minimum) and because the reset event
      can occur at any instant after the CRQ is initialized, there will always
      be a window between initializing the CRQ and considering the adapter
      ready for resets (ie state == PROBED).
      
      So rather than discarding resets during this window, allow queueing them
      - but only process them after the adapter is fully initialized.
      
      To do this, introduce a new completion state ->probe_done and have the
      reset worker thread wait on this before processing resets.
      
      This change brings up two new situations in or just after ibmvnic_probe().
      First after one or more resets were queued, we encounter an error and
      decide to retry the initialization.  At that point the queued resets are
      no longer relevant since we could be talking to a new vnic server. So we
      must purge/flush the queued resets before restarting the initialization.
      As a side note, since we are still in the probing stage and we have not
      registered the netdev, it will not be CHANGE_PARAM reset.
      
      Second this change opens up a potential race between the worker thread
      in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
      following sequence of events:
      
      	1. Register CRQ
      	2. Get transport event before CRQ_INIT completes.
      	3. Tasklet schedules reset:
      		a) add rwi to list
      		b) schedule_work() to start worker thread which runs
      		   and waits for ->probe_done.
      	4. ibmvnic_probe() decides to retry, purges rwi_list
      	5. Re-register crq and this time rest of probe succeeds - register
      	   netdev and complete(->probe_done).
      	6. Worker thread resumes in __ibmvnic_reset() from 3b.
      	7. Worker thread sets ->resetting bit
      	8. ibmvnic_open() comes in, notices ->resetting bit, sets state
      	   to IBMVNIC_OPEN and returns early expecting worker thread to
      	   finish the open.
      	9. Worker thread finds rwi_list empty and returns without
      	   opening the interface.
      
      If this happens, the ->ndo_open() call is effectively lost and the
      interface remains down. To address this, ensure that ->rwi_list is
      not empty before setting the ->resetting  bit. See also comments in
      __ibmvnic_reset().
      
      Fixes: 6a2fb0e9 ("ibmvnic: driver initialization for kdump/kexec")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd98693c
    • Sukadev Bhattiprolu's avatar
      ibmvnic: clear fop when retrying probe · f628ad53
      Sukadev Bhattiprolu authored
      Clear ->failover_pending flag that may have been set in the previous
      pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
      call would be misled into thinking a failover is pending and assuming
      that the reset worker thread would open the adapter. If this pass of
      registering the CRQ succeeds (i.e there is no transport event), there
      wouldn't be a reset worker thread.
      
      This would leave the adapter unconfigured and require manual intervention
      to bring it up during boot.
      
      Fixes: 5a18e1e0 ("ibmvnic: Fix failover case for non-redundant configuration")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f628ad53
    • Sukadev Bhattiprolu's avatar
      ibmvnic: init init_done_rc earlier · ae16bf15
      Sukadev Bhattiprolu authored
      We currently initialize the ->init_done completion/return code fields
      before issuing a CRQ_INIT command. But if we get a transport event soon
      after registering the CRQ the taskslet may already have recorded the
      completion and error code. If we initialize here, we might overwrite/
      lose that and end up issuing the CRQ_INIT only to timeout later.
      
      If that timeout happens during probe, we will leave the adapter in the
      DOWN state rather than retrying to register/init the CRQ.
      
      Initialize the completion before registering the CRQ so we don't lose
      the notification.
      
      Fixes: 032c5e82 ("Driver for IBM System i/p VNIC protocol")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae16bf15
    • Sukadev Bhattiprolu's avatar
      ibmvnic: register netdev after init of adapter · 570425f8
      Sukadev Bhattiprolu authored
      Finish initializing the adapter before registering netdev so state
      is consistent.
      
      Fixes: c26eba03 ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      570425f8
    • Sukadev Bhattiprolu's avatar
      ibmvnic: complete init_done on transport events · 36491f2d
      Sukadev Bhattiprolu authored
      If we get a transport event, set the error and mark the init as
      complete so the attempt to send crq-init or login fail sooner
      rather than wait for the timeout.
      
      Fixes: bbd669a8 ("ibmvnic: Fix completion structure initialization")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36491f2d
    • Sukadev Bhattiprolu's avatar
      ibmvnic: define flush_reset_queue helper · 83da53f7
      Sukadev Bhattiprolu authored
      Define and use a helper to flush the reset queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83da53f7
    • Sukadev Bhattiprolu's avatar
      ibmvnic: initialize rc before completing wait · 765559b1
      Sukadev Bhattiprolu authored
      We should initialize ->init_done_rc before calling complete(). Otherwise
      the waiting thread may see ->init_done_rc as 0 before we have updated it
      and may assume that the CRQ was successful.
      
      Fixes: 6b278c0c ("ibmvnic delay complete()")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765559b1
    • Sukadev Bhattiprolu's avatar
      ibmvnic: free reset-work-item when flushing · 8d0657f3
      Sukadev Bhattiprolu authored
      Fix a tiny memory leak when flushing the reset work queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0657f3
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 31372fe9
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      1) Fix PMTU for IPv6 if the reported MTU minus the ESP overhead is
         smaller than 1280. From Jiri Bohac.
      
      2) Fix xfrm interface ID and inter address family tunneling when
         migrating xfrm states. From Yan Yan.
      
      3) Add missing xfrm intrerface ID initialization on xfrmi_changelink.
         From Antony Antony.
      
      4) Enforce validity of xfrm offload input flags so that userspace can't
         send undefined flags to the offload driver.
         From Leon Romanovsky.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31372fe9
    • Vladimir Oltean's avatar
      net: dcb: flush lingering app table entries for unregistered devices · 91b0383f
      Vladimir Oltean authored
      If I'm not mistaken (and I don't think I am), the way in which the
      dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
      the application table with dynamically allocated struct dcb_app_type
      entries that are kept in the module-global dcb_app_list.
      
      However, nobody keeps exact track of these entries, and although
      dcb_ieee_delapp() is supposed to remove them, nobody does so when the
      interface goes away (example: driver unbinds from device). So the
      dcb_app_list will contain lingering entries with an ifindex that no
      longer matches any device in dcb_app_lookup().
      
      Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
      flushing the app table entries of interfaces that are now gone.
      
      In fact something like this used to be done as part of the initial
      commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
      essentially at module_exit time. That became dead code after commit
      7a6b6f51 ("DCB: fix kconfig option") which essentially merged
      "tristate config DCB" and "bool config DCBNL" into a single "bool config
      DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
      
      Commit 36b9ad80 ("net/dcb: make dcbnl.c explicitly non-modular")
      recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
      leaving us with the version we have today.
      
      Since flushing application table entries can and should be done as soon
      as the netdevice disappears, fundamentally the commit that is to blame
      is the one that introduced the design of this API.
      
      Fixes: 9ab933ab ("dcbnl: add appliction tlv handlers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91b0383f
    • D. Wythe's avatar
      net/smc: fix connection leak · 9f1c50cf
      D. Wythe authored
      There's a potential leak issue under following execution sequence :
      
      smc_release  				smc_connect_work
      if (sk->sk_state == SMC_INIT)
      					send_clc_confirim
      	tcp_abort();
      					...
      					sk.sk_state = SMC_ACTIVE
      smc_close_active
      switch(sk->sk_state) {
      ...
      case SMC_ACTIVE:
      	smc_close_final()
      	// then wait peer closed
      
      Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
      still in the tcp send buffer, in which case our connection token cannot
      be delivered to the server side, which means that we cannot get a
      passive close message at all. Therefore, it is impossible for the to be
      disconnected at all.
      
      This patch tries a very simple way to avoid this issue, once the state
      has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
      smc connection, considering that the state is SMC_INIT before
      tcp_abort(), abandoning the complete disconnection process should not
      cause too much problem.
      
      In fact, this problem may exist as long as the CLC CONFIRM message is
      not received by the server. Whether a timer should be added after
      smc_close_final() needs to be discussed in the future. But even so, this
      patch provides a faster release for connection in above case, it should
      also be valuable.
      
      Fixes: 39f41f36 ("net/smc: common release code for non-accepted sockets")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f1c50cf
    • Vincent Whitchurch's avatar
      net: stmmac: only enable DMA interrupts when ready · 087a7b94
      Vincent Whitchurch authored
      In this driver's ->ndo_open() callback, it enables DMA interrupts,
      starts the DMA channels, then requests interrupts with request_irq(),
      and then finally enables napi.
      
      If RX DMA interrupts are received before napi is enabled, no processing
      is done because napi_schedule_prep() will return false.  If the network
      has a lot of broadcast/multicast traffic, then the RX ring could fill up
      completely before napi is enabled.  When this happens, no further RX
      interrupts will be delivered, and the driver will fail to receive any
      packets.
      
      Fix this by only enabling DMA interrupts after all other initialization
      is complete.
      
      Fixes: 523f11b5 ("net: stmmac: move hardware setup for stmmac_open to new function")
      Reported-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      087a7b94
    • Marek Marczykowski-Górecki's avatar
      xen/netfront: destroy queues before real_num_tx_queues is zeroed · dcf4ff7a
      Marek Marczykowski-Górecki authored
      xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
      delete queues. Since d7dac083
      ("net-sysfs: update the queue counts in the unregistration path"),
      unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
      facts together means, that xennet_destroy_queues() called from
      xennet_remove() cannot do its job, because it's called after
      unregister_netdev(). This results in kfree-ing queues that are still
      linked in napi, which ultimately crashes:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000000
          #PF: supervisor read access in kernel mode
          #PF: error_code(0x0000) - not-present page
          PGD 0 P4D 0
          Oops: 0000 [#1] PREEMPT SMP PTI
          CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
          RIP: 0010:free_netdev+0xa3/0x1a0
          Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
          RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
          RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
          RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
          RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
          R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
          FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
          Call Trace:
           <TASK>
           xennet_remove+0x13d/0x300 [xen_netfront]
           xenbus_dev_remove+0x6d/0xf0
           __device_release_driver+0x17a/0x240
           device_release_driver+0x24/0x30
           bus_remove_device+0xd8/0x140
           device_del+0x18b/0x410
           ? _raw_spin_unlock+0x16/0x30
           ? klist_iter_exit+0x14/0x20
           ? xenbus_dev_request_and_reply+0x80/0x80
           device_unregister+0x13/0x60
           xenbus_dev_changed+0x18e/0x1f0
           xenwatch_thread+0xc0/0x1a0
           ? do_wait_intr_irq+0xa0/0xa0
           kthread+0x16b/0x190
           ? set_kthread_struct+0x40/0x40
           ret_from_fork+0x22/0x30
           </TASK>
      
      Fix this by calling xennet_destroy_queues() from xennet_uninit(),
      when real_num_tx_queues is still available. This ensures that queues are
      destroyed when real_num_tx_queues is set to 0, regardless of how
      unregister_netdev() was called.
      
      Originally reported at
      https://github.com/QubesOS/qubes-issues/issues/7257
      
      Fixes: d7dac083 ("net-sysfs: update the queue counts in the unregistration path")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf4ff7a
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-5-17' · a6df953f
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for 5.17
      
      Patch 1 fixes an issue with the SIOCOUTQ ioctl in MPTCP sockets that
      have performed a fallback to TCP.
      
      Patch 2 is a selftest fix to correctly remove temp files.
      
      Patch 3 fixes a shift-out-of-bounds issue found by syzkaller.
      ====================
      
      Link: https://lore.kernel.org/r/20220225005259.318898-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6df953f
    • Mat Martineau's avatar
      mptcp: Correctly set DATA_FIN timeout when number of retransmits is large · 877d11f0
      Mat Martineau authored
      Syzkaller with UBSAN uncovered a scenario where a large number of
      DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
      timeout calculation:
      
      ================================================================================
      UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
      shift exponent 32 is too large for 32-bit type 'unsigned int'
      CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      Workqueue: events mptcp_worker
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
       __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
       mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
       __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
       mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
       process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
       worker_thread+0x95/0xe10 kernel/workqueue.c:2454
       kthread+0x2f4/0x3b0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      ================================================================================
      
      This change limits the maximum timeout by limiting the size of the
      shift, which keeps all intermediate values in-bounds.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
      Fixes: 6477dd39 ("mptcp: Retransmit DATA_FIN")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      877d11f0
    • Paolo Abeni's avatar
      selftests: mptcp: do complete cleanup at exit · 63bb8239
      Paolo Abeni authored
      After commit 05be5e27 ("selftests: mptcp: add disconnect tests")
      the mptcp selftests leave behind a couple of tmp files after
      each run. run_tests_disconnect() misnames a few variables used to
      track them. Address the issue setting the appropriate global variables
      
      Fixes: 05be5e27 ("selftests: mptcp: add disconnect tests")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      63bb8239
    • Paolo Abeni's avatar
      mptcp: accurate SIOCOUTQ for fallback socket · 07c2c7a3
      Paolo Abeni authored
      The MPTCP SIOCOUTQ implementation is not very accurate in
      case of fallback: it only measures the data in the MPTCP-level
      write queue, but it does not take in account the subflow
      write queue utilization. In case of fallback the first can be
      empty, while the latter is not.
      
      The above produces sporadic self-tests issues and can foul
      legit user-space application.
      
      Fix the issue additionally querying the subflow in case of fallback.
      
      Fixes: 644807e3 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07c2c7a3
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 8a727100
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression with RFCOMM
       - Fix regression with LE devices using Privacy (RPA)
       - Fix regression with LE devices not waiting proper timeout to
         establish connections
       - Fix race in smp
      
      * tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: hci_sync: Fix not using conn_timeout
        Bluetooth: hci_sync: Fix hci_update_accept_list_sync
        Bluetooth: assign len after null check
        Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
        Bluetooth: fix data races in smp_unregister(), smp_del_chan()
        Bluetooth: hci_core: Fix leaking sent_cmd skb
      ====================
      
      Link: https://lore.kernel.org/r/20220224210838.197787-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a727100
  2. 24 Feb, 2022 22 commits