1. 30 Dec, 2021 9 commits
  2. 29 Dec, 2021 9 commits
  3. 28 Dec, 2021 9 commits
    • James McLaughlin's avatar
      igc: Fix TX timestamp support for non-MSI-X platforms · f85846bb
      James McLaughlin authored
      Time synchronization was not properly enabled on non-MSI-X platforms.
      
      Fixes: 2c344ae2 ("igc: Add support for TX timestamping")
      Signed-off-by: default avatarJames McLaughlin <james.mclaughlin@qsc.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f85846bb
    • Vinicius Costa Gomes's avatar
      igc: Do not enable crosstimestamping for i225-V models · 1e81dcc1
      Vinicius Costa Gomes authored
      It was reported that when PCIe PTM is enabled, some lockups could
      be observed with some integrated i225-V models.
      
      While the issue is investigated, we can disable crosstimestamp for
      those models and see no loss of functionality, because those models
      don't have any support for time synchronization.
      
      Fixes: a90ec848 ("igc: Add support for PTP getcrosststamp()")
      Link: https://lore.kernel.org/all/924175a188159f4e03bd69908a91e606b574139b.camel@gmx.de/Reported-by: default avatarStefan Dietrich <roots@gmx.de>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1e81dcc1
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 16fa29ae
      David S. Miller authored
      Dust Li says:
      
      ====================
      net/smc: fix kernel panic caused by race of smc_sock
      
      This patchset fixes the race between smc_release triggered by
      close(2) and cdc_handle triggered by underlaying RDMA device.
      
      The race is caused because the smc_connection may been released
      before the pending tx CDC messages got its CQEs. In order to fix
      this, I add a counter to track how many pending WRs we have posted
      through the smc_connection, and only release the smc_connection
      after there is no pending WRs on the connection.
      
      The first patch prevents posting WR on a QP that is not in RTS
      state. This patch is needed because if we post WR on a QP that
      is not in RTS state, ib_post_send() may success but no CQE will
      return, and that will confuse the counter tracking the pending
      WRs.
      
      The second patch add a counter to track how many WRs were posted
      through the smc_connection, and don't reset the QP on link destroying
      to prevent leak of the counter.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16fa29ae
    • Dust Li's avatar
      net/smc: fix kernel panic caused by race of smc_sock · 349d4312
      Dust Li authored
      A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
      but smc_release() has already freed it.
      
      [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
      [ 4570.696048] #PF: supervisor write access in kernel mode
      [ 4570.696728] #PF: error_code(0x0002) - not-present page
      [ 4570.697401] PGD 0 P4D 0
      [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
      [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
      [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
      <...>
      [ 4570.711446] Call Trace:
      [ 4570.711746]  <IRQ>
      [ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
      [ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
      [ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
      [ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
      [ 4570.714083]  __do_softirq+0x123/0x2f4
      [ 4570.714521]  irq_exit_rcu+0xc4/0xf0
      [ 4570.714934]  common_interrupt+0xba/0xe0
      
      Though smc_cdc_tx_handler() checked the existence of smc connection,
      smc_release() may have already dismissed and released the smc socket
      before smc_cdc_tx_handler() further visits it.
      
      smc_cdc_tx_handler()           |smc_release()
      if (!conn)                     |
                                     |
                                     |smc_cdc_tx_dismiss_slots()
                                     |      smc_cdc_tx_dismisser()
                                     |
                                     |sock_put(&smc->sk) <- last sock_put,
                                     |                      smc_sock freed
      bh_lock_sock(&smc->sk) (panic) |
      
      To make sure we won't receive any CDC messages after we free the
      smc_sock, add a refcount on the smc_connection for inflight CDC
      message(posted to the QP but haven't received related CQE), and
      don't release the smc_connection until all the inflight CDC messages
      haven been done, for both success or failed ones.
      
      Using refcount on CDC messages brings another problem: when the link
      is going to be destroyed, smcr_link_clear() will reset the QP, which
      then remove all the pending CQEs related to the QP in the CQ. To make
      sure all the CQEs will always come back so the refcount on the
      smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
      by smc_ib_modify_qp_error().
      And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
      need to wait for all pending WQEs done, or we may encounter use-after-
      free when handling CQEs.
      
      For IB device removal routine, we need to wait for all the QPs on that
      device been destroyed before we can destroy CQs on the device, or
      the refcount on smc_connection won't reach 0 and smc_sock cannot be
      released.
      
      Fixes: 5f08318f ("smc: connection data control (CDC)")
      Reported-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      349d4312
    • Dust Li's avatar
      net/smc: don't send CDC/LLC message if link not ready · 90cee52f
      Dust Li authored
      We found smc_llc_send_link_delete_all() sometimes wait
      for 2s timeout when testing with RDMA link up/down.
      It is possible when a smc_link is in ACTIVATING state,
      the underlaying QP is still in RESET or RTR state, which
      cannot send any messages out.
      
      smc_llc_send_link_delete_all() use smc_link_usable() to
      checks whether the link is usable, if the QP is still in
      RESET or RTR state, but the smc_link is in ACTIVATING, this
      LLC message will always fail without any CQE entering the
      CQ, and we will always wait 2s before timeout.
      
      Since we cannot send any messages through the QP before
      the QP enter RTS. I add a wrapper smc_link_sendable()
      which checks the state of QP along with the link state.
      And replace smc_link_usable() with smc_link_sendable()
      in all LLC & CDC message sending routine.
      
      Fixes: 5f08318f ("smc: connection data control (CDC)")
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90cee52f
    • Wei Yongjun's avatar
      NFC: st21nfca: Fix memory leak in device probe and remove · 1b9dadba
      Wei Yongjun authored
      'phy->pending_skb' is alloced when device probe, but forgot to free
      in the error handling path and remove path, this cause memory leak
      as follows:
      
      unreferenced object 0xffff88800bc06800 (size 512):
        comm "8", pid 11775, jiffies 4295159829 (age 9.032s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000d66c09ce>] __kmalloc_node_track_caller+0x1ed/0x450
          [<00000000c93382b3>] kmalloc_reserve+0x37/0xd0
          [<000000005fea522c>] __alloc_skb+0x124/0x380
          [<0000000019f29f9a>] st21nfca_hci_i2c_probe+0x170/0x8f2
      
      Fix it by freeing 'pending_skb' in error and remove.
      
      Fixes: 68957303 ("NFC: ST21NFCA: Add driver for STMicroelectronics ST21NFCA NFC Chip")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b9dadba
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: fix statistics of received bytes · 5be60a94
      Aleksander Jan Bajkowski authored
      Received frames have FCS truncated. There is no need
      to subtract FCS length from the statistics.
      
      Fixes: fe1a5642 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5be60a94
    • Christophe JAILLET's avatar
      net: ag71xx: Fix a potential double free in error handling paths · 1cd5384c
      Christophe JAILLET authored
      'ndev' is a managed resource allocated with devm_alloc_etherdev(), so there
      is no need to call free_netdev() explicitly or there will be a double
      free().
      
      Simplify all error handling paths accordingly.
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd5384c
    • wolfgang huang's avatar
      mISDN: change function names to avoid conflicts · 8b5fdfc5
      wolfgang huang authored
      As we build for mips, we meet following error. l1_init error with
      multiple definition. Some architecture devices usually marked with
      l1, l2, lxx as the start-up phase. so we change the mISDN function
      names, align with Isdnl2_xxx.
      
      mips-linux-gnu-ld: drivers/isdn/mISDN/layer1.o: in function `l1_init':
      (.text+0x890): multiple definition of `l1_init'; \
      arch/mips/kernel/bmips_5xxx_init.o:(.text+0xf0): first defined here
      make[1]: *** [home/mips/kernel-build/linux/Makefile:1161: vmlinux] Error 1
      Signed-off-by: default avatarwolfgang huang <huangjinhui@kylinos.cn>
      Reported-by: default avatark2ci <kernel-bot@kylinos.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b5fdfc5
  4. 27 Dec, 2021 7 commits
    • Krzysztof Kozlowski's avatar
      nfc: uapi: use kernel size_t to fix user-space builds · 79b69a83
      Krzysztof Kozlowski authored
      Fix user-space builds if it includes /usr/include/linux/nfc.h before
      some of other headers:
      
        /usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’
          281 |         size_t service_name_len;
              |         ^~~~~~
      
      Fixes: d646960f ("NFC: Initial LLCP support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79b69a83
    • Dmitry V. Levin's avatar
      uapi: fix linux/nfc.h userspace compilation errors · 7175f02c
      Dmitry V. Levin authored
      Replace sa_family_t with __kernel_sa_family_t to fix the following
      linux/nfc.h userspace compilation errors:
      
      /usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
        sa_family_t sa_family;
      /usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
        sa_family_t sa_family;
      
      Fixes: 23b7869c ("NFC: add the NFC socket raw protocol")
      Fixes: d646960f ("NFC: Initial LLCP support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7175f02c
    • Matthias-Christian Ott's avatar
      net: usb: pegasus: Do not drop long Ethernet frames · ca506fca
      Matthias-Christian Ott authored
      The D-Link DSB-650TX (2001:4002) is unable to receive Ethernet frames
      that are longer than 1518 octets, for example, Ethernet frames that
      contain 802.1Q VLAN tags.
      
      The frames are sent to the pegasus driver via USB but the driver
      discards them because they have the Long_pkt field set to 1 in the
      received status report. The function read_bulk_callback of the pegasus
      driver treats such received "packets" (in the terminology of the
      hardware) as errors but the field simply does just indicate that the
      Ethernet frame (MAC destination to FCS) is longer than 1518 octets.
      
      It seems that in the 1990s there was a distinction between
      "giant" (> 1518) and "runt" (< 64) frames and the hardware includes
      flags to indicate this distinction. It seems that the purpose of the
      distinction "giant" frames was to not allow infinitely long frames due
      to transmission errors and to allow hardware to have an upper limit of
      the frame size. However, the hardware already has such limit with its
      2048 octet receive buffer and, therefore, Long_pkt is merely a
      convention and should not be treated as a receive error.
      
      Actually, the hardware is even able to receive Ethernet frames with 2048
      octets which exceeds the claimed limit frame size limit of the driver of
      1536 octets (PEGASUS_MTU).
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarMatthias-Christian Ott <ott@mirix.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca506fca
    • Zekun Shen's avatar
      atlantic: Fix buff_ring OOB in aq_ring_rx_clean · 5f501532
      Zekun Shen authored
      The function obtain the next buffer without boundary check.
      We should return with I/O error code.
      
      The bug is found by fuzzing and the crash report is attached.
      It is an OOB bug although reported as use-after-free.
      
      [    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
      [    4.806505]
      [    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
      [    4.809030] Call Trace:
      [    4.809343]  dump_stack+0x76/0xa0
      [    4.809755]  print_address_description.constprop.0+0x16/0x200
      [    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.813183]  __kasan_report.cold+0x37/0x7c
      [    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.814393]  kasan_report+0xe/0x20
      [    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
      [    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
      [    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
      [    4.817746]  ? __next_timer_interrupt+0xba/0xf0
      [    4.818322]  net_rx_action+0x363/0xbd0
      [    4.818803]  ? call_timer_fn+0x240/0x240
      [    4.819302]  ? __switch_to_asm+0x40/0x70
      [    4.819809]  ? napi_busy_loop+0x520/0x520
      [    4.820324]  __do_softirq+0x18c/0x634
      [    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
      [    4.821343]  run_ksoftirqd+0x15/0x20
      [    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
      [    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
      [    4.823041]  ? __kthread_parkme+0x80/0x100
      [    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
      [    4.824301]  kthread+0x2b5/0x3b0
      [    4.824723]  ? kthread_create_on_node+0xd0/0xd0
      [    4.825304]  ret_from_fork+0x35/0x40
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f501532
    • yangxingwu's avatar
      net: udp: fix alignment problem in udp4_seq_show() · 6c25449e
      yangxingwu authored
      $ cat /pro/net/udp
      
      before:
      
        sl  local_address rem_address   st tx_queue rx_queue tr tm->when
      26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
      26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
      27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000
      
      after:
      
         sl  local_address rem_address   st tx_queue rx_queue tr tm->when
      26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
      26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
      27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000
      Signed-off-by: default avataryangxingwu <xingwu.yang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c25449e
    • Karsten Graul's avatar
      net/smc: fix using of uninitialized completions · 6d7373da
      Karsten Graul authored
      In smc_wr_tx_send_wait() the completion on index specified by
      pend->idx is initialized and after smc_wr_tx_send() was called the wait
      for completion starts. pend->idx is used to get the correct index for
      the wait, but the pend structure could already be cleared in
      smc_wr_tx_process_cqe().
      Introduce pnd_idx to hold and use a local copy of the correct index.
      
      Fixes: 09c61d24 ("net/smc: wait for departure of an IB message")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d7373da
    • William Zhao's avatar
      ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate · c1833c39
      William Zhao authored
      The "__ip6_tnl_parm" struct was left uninitialized causing an invalid
      load of random data when the "__ip6_tnl_parm" struct was used elsewhere.
      As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access
      the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and
      containing random data, the UBSAN detected that "collect_md" held a
      non-boolean value.
      
      The UBSAN issue is as follows:
      ===============================================================
      UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14
      load of value 30 is not a valid value for type '_Bool'
      CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      Workqueue: ipv6_addrconf addrconf_dad_work
      Call Trace:
      <TASK>
      dump_stack_lvl+0x44/0x57
      ubsan_epilogue+0x5/0x40
      __ubsan_handle_load_invalid_value+0x66/0x70
      ? __cpuhp_setup_state+0x1d3/0x210
      ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel]
      vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti]
      ? lock_is_held_type+0xd9/0x130
      ? vti6_rcv+0x100/0x100 [ip6_vti]
      ? lock_is_held_type+0xd9/0x130
      ? rcu_read_lock_bh_held+0xc0/0xc0
      ? lock_acquired+0x262/0xb10
      dev_hard_start_xmit+0x1e6/0x820
      __dev_queue_xmit+0x2079/0x3340
      ? mark_lock.part.52+0xf7/0x1050
      ? netdev_core_pick_tx+0x290/0x290
      ? kvm_clock_read+0x14/0x30
      ? kvm_sched_clock_read+0x5/0x10
      ? sched_clock_cpu+0x15/0x200
      ? find_held_lock+0x3a/0x1c0
      ? lock_release+0x42f/0xc90
      ? lock_downgrade+0x6b0/0x6b0
      ? mark_held_locks+0xb7/0x120
      ? neigh_connected_output+0x31f/0x470
      ? lockdep_hardirqs_on+0x79/0x100
      ? neigh_connected_output+0x31f/0x470
      ? ip6_finish_output2+0x9b0/0x1d90
      ? rcu_read_lock_bh_held+0x62/0xc0
      ? ip6_finish_output2+0x9b0/0x1d90
      ip6_finish_output2+0x9b0/0x1d90
      ? ip6_append_data+0x330/0x330
      ? ip6_mtu+0x166/0x370
      ? __ip6_finish_output+0x1ad/0xfb0
      ? nf_hook_slow+0xa6/0x170
      ip6_output+0x1fb/0x710
      ? nf_hook.constprop.32+0x317/0x430
      ? ip6_finish_output+0x180/0x180
      ? __ip6_finish_output+0xfb0/0xfb0
      ? lock_is_held_type+0xd9/0x130
      ndisc_send_skb+0xb33/0x1590
      ? __sk_mem_raise_allocated+0x11cf/0x1560
      ? dst_output+0x4a0/0x4a0
      ? ndisc_send_rs+0x432/0x610
      addrconf_dad_completed+0x30c/0xbb0
      ? addrconf_rs_timer+0x650/0x650
      ? addrconf_dad_work+0x73c/0x10e0
      addrconf_dad_work+0x73c/0x10e0
      ? addrconf_dad_completed+0xbb0/0xbb0
      ? rcu_read_lock_sched_held+0xaf/0xe0
      ? rcu_read_lock_bh_held+0xc0/0xc0
      process_one_work+0x97b/0x1740
      ? pwq_dec_nr_in_flight+0x270/0x270
      worker_thread+0x87/0xbf0
      ? process_one_work+0x1740/0x1740
      kthread+0x3ac/0x490
      ? set_kthread_struct+0x100/0x100
      ret_from_fork+0x22/0x30
      </TASK>
      ===============================================================
      
      The solution is to initialize "__ip6_tnl_parm" struct to zeros in the
      "vti6_siocdevprivate()" function.
      Signed-off-by: default avatarWilliam Zhao <wizhao@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1833c39
  5. 25 Dec, 2021 2 commits
    • Ma Xinjian's avatar
      selftests: mptcp: Remove the deprecated config NFT_COUNTER · e6007b85
      Ma Xinjian authored
      NFT_COUNTER was removed since
      390ad4295aa ("netfilter: nf_tables: make counter support built-in")
      LKP/0Day will check if all configs listing under selftests are able to
      be enabled properly.
      
      For the missing configs, it will report something like:
      LKP WARN miss config CONFIG_NFT_COUNTER= of net/mptcp/config
      
      - it's not reasonable to keep the deprecated configs.
      - configs under kselftests are recommended by corresponding tests.
      So if some configs are missing, it will impact the testing results
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMa Xinjian <xinjianx.ma@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6007b85
    • Xin Long's avatar
      sctp: use call_rcu to free endpoint · 5ec7d18d
      Xin Long authored
      This patch is to delay the endpoint free by calling call_rcu() to fix
      another use-after-free issue in sctp_sock_dump():
      
        BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
        Call Trace:
          __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
          lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
          _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
          spin_lock_bh include/linux/spinlock.h:334 [inline]
          __lock_sock+0x203/0x350 net/core/sock.c:2253
          lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
          lock_sock include/net/sock.h:1492 [inline]
          sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
          sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
          sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
          __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
          inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
          netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
          __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
          netlink_dump_start include/linux/netlink.h:216 [inline]
          inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
          __sock_diag_cmd net/core/sock_diag.c:232 [inline]
          sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
          netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
          sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
      
      This issue occurs when asoc is peeled off and the old sk is freed after
      getting it by asoc->base.sk and before calling lock_sock(sk).
      
      To prevent the sk free, as a holder of the sk, ep should be alive when
      calling lock_sock(). This patch uses call_rcu() and moves sock_put and
      ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
      hold the ep under rcu_read_lock in sctp_transport_traverse_process().
      
      If sctp_endpoint_hold() returns true, it means this ep is still alive
      and we have held it and can continue to dump it; If it returns false,
      it means this ep is dead and can be freed after rcu_read_unlock, and
      we should skip it.
      
      In sctp_sock_dump(), after locking the sk, if this ep is different from
      tsp->asoc->ep, it means during this dumping, this asoc was peeled off
      before calling lock_sock(), and the sk should be skipped; If this ep is
      the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
      and due to lock_sock, no peeloff will happen either until release_sock.
      
      Note that delaying endpoint free won't delay the port release, as the
      port release happens in sctp_endpoint_destroy() before calling call_rcu().
      Also, freeing endpoint by call_rcu() makes it safe to access the sk by
      asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
      
      Thanks Jones to bring this issue up.
      
      v1->v2:
        - improve the changelog.
        - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
      
      Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Fixes: d25adbeb ("sctp: fix an use-after-free issue in sctp_sock_dump")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ec7d18d
  6. 24 Dec, 2021 4 commits