1. 06 Sep, 2022 2 commits
    • Christian Marangi's avatar
      net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data · 42b998d4
      Christian Marangi authored
      of_device_get_match_data is called on priv->dev before priv->dev is
      actually set. Move of_device_get_match_data after priv->dev is correctly
      set to fix this kernel panic.
      
      Fixes: 3bb0844e ("net: dsa: qca8k: cache match data to speed up access")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220904215319.13070-1-ansuelsmth@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42b998d4
    • Neal Cardwell's avatar
      tcp: fix early ETIMEDOUT after spurious non-SACK RTO · 686dc2db
      Neal Cardwell authored
      Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
      of a spurious non-SACK RTO could cause a connection to fail to clear
      retrans_stamp, causing a later RTO to very prematurely time out the
      connection with ETIMEDOUT.
      
      Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
      report:
      
      (*1) Send one data packet on a non-SACK connection
      
      (*2) Because no ACK packet is received, the packet is retransmitted
           and we enter CA_Loss; but this retransmission is spurious.
      
      (*3) The ACK for the original data is received. The transmitted packet
           is acknowledged.  The TCP timestamp is before the retrans_stamp,
           so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
           true without changing state to Open (because tcp_is_sack() is
           false), and tcp_process_loss() returns without calling
           tcp_try_undo_recovery().  Normally after undoing a CA_Loss
           episode, tcp_fastretrans_alert() would see that the connection
           has returned to CA_Open and fall through and call
           tcp_try_to_open(), which would set retrans_stamp to 0.  However,
           for non-SACK connections we hold the connection in CA_Loss, so do
           not fall through to call tcp_try_to_open() and do not set
           retrans_stamp to 0. So retrans_stamp is (erroneously) still
           non-zero.
      
           At this point the first "retransmission event" has passed and
           been recovered from. Any future retransmission is a completely
           new "event". However, retrans_stamp is erroneously still
           set. (And we are still in CA_Loss, which is correct.)
      
      (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
           packet is sent. Note: No data is transmitted between (*3) and
           (*4) and we disabled keep alives.
      
           The socket's timeout SHOULD be calculated from this point in
           time, but instead it's calculated from the prior "event" 16
           minutes ago (step (*2)).
      
      (*5) Because no ACK packet is received, the packet is retransmitted.
      
      (*6) At the time of the 2nd retransmission, the socket returns
           ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
           too far in the past (set at the time of (*2)).
      
      This commit fixes this bug by ensuring that we reuse in
      tcp_try_undo_loss() the same careful logic for non-SACK connections
      that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
      we factor out that logic into a new
      tcp_is_non_sack_preventing_reopen() helper and call that helper from
      both undo functions.
      
      Fixes: da34ac76 ("tcp: only undo on partial ACKs in CA_Loss")
      Reported-by: default avatarNagaraj Arankal <nagaraj.p.arankal@hpe.com>
      Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      686dc2db
  2. 05 Sep, 2022 8 commits
    • David S. Miller's avatar
      Merge tag 'for-net-2022-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · beb43252
      David S. Miller authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression preventing ACL packet transmission
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beb43252
    • Christophe JAILLET's avatar
      stmmac: intel: Simplify intel_eth_pci_remove() · 1621e70f
      Christophe JAILLET authored
      There is no point to call pcim_iounmap_regions() in the remove function,
      this frees a managed resource that would be release by the framework
      anyway.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1621e70f
    • Greg Kroah-Hartman's avatar
      net: mvpp2: debugfs: fix memory leak when using debugfs_lookup() · fe2c9c61
      Greg Kroah-Hartman authored
      When calling debugfs_lookup() the result must have dput() called on it,
      otherwise the memory will leak over time.  Fix this up to be much
      simpler logic and only create the root debugfs directory once when the
      driver is first accessed.  That resolves the memory leak and makes
      things more obvious as to what the intent is.
      
      Cc: Marcin Wojtas <mw@semihalf.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Cc: stable <stable@kernel.org>
      Fixes: 21da57a2 ("net: mvpp2: add a debugfs interface for the Header Parser")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe2c9c61
    • David Lebrun's avatar
      ipv6: sr: fix out-of-bounds read when setting HMAC data. · 84a53580
      David Lebrun authored
      The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
      Segment Routing Headers. This configuration is realised via netlink through
      four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
      SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
      length of the SECRET attribute, it is possible to provide invalid combinations
      (e.g., secret = "", secretlen = 64). This case is not checked in the code and
      with an appropriately crafted netlink message, an out-of-bounds read of up
      to 64 bytes (max secret length) can occur past the skb end pointer and into
      skb_shared_info:
      
      Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
      208		memcpy(hinfo->secret, secret, slen);
      (gdb) bt
       #0  seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
       #1  0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
          extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>,
          family=<optimized out>) at net/netlink/genetlink.c:731
       #2  0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
          family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775
       #3  genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
       #4  0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>)
          at net/netlink/af_netlink.c:2501
       #5  0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
       #6  0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
          at net/netlink/af_netlink.c:1319
       #7  netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>)
          at net/netlink/af_netlink.c:1345
       #8  0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921
      ...
      (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
      $1 = 0xffff88800b1b76c0
      (gdb) p/x secret
      $2 = 0xffff88800b1b76c0
      (gdb) p slen
      $3 = 64 '@'
      
      The OOB data can then be read back from userspace by dumping HMAC state. This
      commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
      SECRET.
      Reported-by: default avatarLucas Leong <wmliang.tw@gmail.com>
      Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
      Fixes: 4f4853dc ("ipv6: sr: implement API to control SR HMAC structure")
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84a53580
    • David S. Miller's avatar
      Merge branch 'bonding-fixes' · 060ad609
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      bonding: fix lladdr finding and confirmation
      
      This patch set fixed 3 issues when setting lladdr as bonding IPv6 target.
      Please see each patch for the details.
      
      v2: separate the patch to 3 parts
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      060ad609
    • Hangbin Liu's avatar
      bonding: accept unsolicited NA message · 592335a4
      Hangbin Liu authored
      The unsolicited NA message with all-nodes multicast dest address should
      be valid, as this also means the link could reach the target.
      
      Also rename bond_validate_ns() to bond_validate_na().
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      592335a4
    • Hangbin Liu's avatar
      bonding: add all node mcast address when slave up · fd16eb94
      Hangbin Liu authored
      When a link is enslave to bond, it need to set the interface down first.
      This makes the slave remove mac multicast address 33:33:00:00:00:01(The
      IPv6 multicast address ff02::1 is kept even when the interface down). When
      bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf8
      ("bonding / ipv6: no addrconf for slaves separately from master").
      
      This is not an issue before we adding the lladdr target feature for bonding,
      as the mac multicast address will be added back when bond interface up and
      join group ff02::1.
      
      But after adding lladdr target feature for bonding. When user set a lladdr
      target, the unsolicited NA message with all-nodes multicast dest will be
      dropped as the slave interface never add 33:33:00:00:00:01 back.
      
      Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when
      the slave interface up.
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd16eb94
    • Hangbin Liu's avatar
      bonding: use unspecified address if no available link local address · b7f14132
      Hangbin Liu authored
      When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get
      available source address and send IPv6 neighbor solicit message.
      
      If the target is global address, ipv6_dev_get_saddr() will get any
      available src address. But if the target is link local address,
      ipv6_dev_get_saddr() will only get available address from our interface,
      i.e. the corresponding bond interface.
      
      But before bond interface up, all the address is tentative, while
      ipv6_dev_get_saddr() will ignore tentative address. This makes we can't
      find available link local src address, then bond_ns_send() will not be
      called and no NS message was sent. Finally bond interface will keep in
      down state.
      
      Fix this by sending NS with unspecified address if there is no available
      source address.
      Reported-by: default avatarLiLiang <liali@redhat.com>
      Fixes: 5e1eeef6 ("bonding: NS target should accept link local address")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7f14132
  3. 04 Sep, 2022 1 commit
  4. 03 Sep, 2022 12 commits
  5. 02 Sep, 2022 11 commits
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix hci_read_buffer_size_sync · be318363
      Luiz Augusto von Dentz authored
      hci_read_buffer_size_sync shall not use HCI_OP_LE_READ_BUFFER_SIZE_V2
      sinze that is LE specific, instead it is hci_le_read_buffer_size_sync
      version that shall use it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216382
      Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      be318363
    • Ivan Vecera's avatar
      iavf: Detach device during reset task · aa626da9
      Ivan Vecera authored
      iavf_reset_task() takes crit_lock at the beginning and holds
      it during whole call. The function subsequently calls
      iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs
      when userspace initiates during the reset task any ndo callback
      that runs under RTNL like iavf_open() because some of that
      functions tries to take crit_lock. This leads to classic A-B B-A
      deadlock scenario.
      
      To resolve this situation the device should be detached in
      iavf_reset_task() prior taking crit_lock to avoid subsequent
      ndos running under RTNL and reattach the device at the end.
      
      Fixes: 62fe2a86 ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability")
      Cc: Jacob Keller <jacob.e.keller@intel.com>
      Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
      Cc: SlawomirX Laba <slawomirx.laba@intel.com>
      Tested-by: default avatarVitaly Grinberg <vgrinber@redhat.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      aa626da9
    • Ivan Vecera's avatar
      i40e: Fix kernel crash during module removal · fb8396ae
      Ivan Vecera authored
      The driver incorrectly frees client instance and subsequent
      i40e module removal leads to kernel crash.
      
      Reproducer:
      1. Do ethtool offline test followed immediately by another one
      host# ethtool -t eth0 offline; ethtool -t eth0 offline
      2. Remove recursively irdma module that also removes i40e module
      host# modprobe -r irdma
      
      Result:
      [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting
      [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished
      [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting
      [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished
      [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110
      [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2
      [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01
      [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1
      [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [ 8687.768755] #PF: supervisor read access in kernel mode
      [ 8687.773895] #PF: error_code(0x0000) - not-present page
      [ 8687.779034] PGD 0 P4D 0
      [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G        W I        5.19.0+ #2
      [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019
      [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e]
      [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b
      [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202
      [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000
      [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000
      [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000
      [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0
      [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008
      [ 8687.870342] FS:  00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000
      [ 8687.878427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0
      [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 8687.905572] PKRU: 55555554
      [ 8687.908286] Call Trace:
      [ 8687.910737]  <TASK>
      [ 8687.912843]  i40e_remove+0x2c0/0x330 [i40e]
      [ 8687.917040]  pci_device_remove+0x33/0xa0
      [ 8687.920962]  device_release_driver_internal+0x1aa/0x230
      [ 8687.926188]  driver_detach+0x44/0x90
      [ 8687.929770]  bus_remove_driver+0x55/0xe0
      [ 8687.933693]  pci_unregister_driver+0x2a/0xb0
      [ 8687.937967]  i40e_exit_module+0xc/0xf48 [i40e]
      
      Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this
      failure is indicated back to i40e_client_subtask() that calls
      i40e_client_del_instance() to free client instance referenced
      by pf->cinst and sets this pointer to NULL. During the module
      removal i40e_remove() calls i40e_lan_del_device() that dereferences
      pf->cinst that is NULL -> crash.
      Do not remove client instance when client open callbacks fails and
      just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs
      to take care about this situation (when netdev is up and client
      is NOT opened) in i40e_notify_client_of_netdev_close() and
      calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED
      is set.
      
      Fixes: 0ef2d5af ("i40e: KISS the client interface")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarHelena Anna Dubel <helena.anna.dubel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fb8396ae
    • Przemyslaw Patynowski's avatar
      i40e: Fix ADQ rate limiting for PF · 45bb006d
      Przemyslaw Patynowski authored
      Fix HW rate limiting for ADQ.
      Fallback to kernel queue selection for ADQ, as it is network stack
      that decides which queue to use for transmit with ADQ configured.
      Reset PF after creation of VMDq2 VSIs required for ADQ, as to
      reprogram TX queue contexts in i40e_configure_tx_ring.
      Without this patch PF would limit TX rate only according to TC0.
      
      Fixes: a9ce82f7 ("i40e: Enable 'channel' mode in mqprio for TC configs")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Tested-by: default avatarBharathi Sreenivas <bharathi.sreenivas@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      45bb006d
    • Michal Swiatkowski's avatar
      ice: use bitmap_free instead of devm_kfree · 59ac3255
      Michal Swiatkowski authored
      pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be
      used to free this memory.
      
      Fixes: 78b5713a ("ice: Alloc queue management bitmaps and arrays dynamically")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      59ac3255
    • Przemyslaw Patynowski's avatar
      ice: Fix DMA mappings leak · 7e753eb6
      Przemyslaw Patynowski authored
      Fix leak, when user changes ring parameters.
      During reallocation of RX buffers, new DMA mappings are created for
      those buffers. New buffers with different RX ring count should
      substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq
      and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused
      leak of already mapped DMA.
      Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate
      back to rx_buf, when BPF program unloads.
      If BPF program is loaded/unloaded and XSK pools are created, reallocate
      RX queues accordingly in XDP_SETUP_XSK_POOL handler.
      
      Steps for reproduction:
      while :
      do
      	for ((i=0; i<=8160; i=i+32))
      	do
      		ethtool -G enp130s0f0 rx $i tx $i
      		sleep 0.5
      		ethtool -g enp130s0f0
      	done
      done
      
      Fixes: 617f3e1b ("ice: xsk: allocate separate memory for XDP SW ring")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7e753eb6
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20220901' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · e7506d34
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc fixes
      Here are some fixes for AF_RXRPC:
      
       (1) Fix the handling of ICMP/ICMP6 packets.  This is a problem due to
           rxrpc being switched to acting as a UDP tunnel, thereby allowing it to
           steal the packets before they go through the UDP Rx queue.  UDP
           tunnels can't get ICMP/ICMP6 packets, however.  This patch adds an
           additional encap hook so that they can.
      
       (2) Fix the encryption routines in rxkad to handle packets that have more
           than three parts correctly.  The problem is that ->nr_frags doesn't
           count the initial fragment, so the sglist ends up too short.
      
       (3) Fix a problem with destruction of the local endpoint potentially
           getting repeated.
      
       (4) Fix the calculation of the time at which to resend.
           jiffies_to_usecs() gives microseconds, not nanoseconds.
      
       (5) Fix AFS to work out when callback promises and locks expire based on
           the time an op was issued rather than the time the first reply packet
           arrives.  We don't know how long the server took between calculating
           the expiry interval and transmitting the reply.
      
       (6) Given (5), rxrpc_get_reply_time() is no longer used, so remove it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7506d34
    • Eric Dumazet's avatar
      tcp: TX zerocopy should not sense pfmemalloc status · 32614006
      Eric Dumazet authored
      We got a recent syzbot report [1] showing a possible misuse
      of pfmemalloc page status in TCP zerocopy paths.
      
      Indeed, for pages coming from user space or other layers,
      using page_is_pfmemalloc() is moot, and possibly could give
      false positives.
      
      There has been attempts to make page_is_pfmemalloc() more robust,
      but not using it in the first place in this context is probably better,
      removing cpu cycles.
      
      Note to stable teams :
      
      You need to backport 84ce071e ("net: introduce
      __skb_fill_page_desc_noacc") as a prereq.
      
      Race is more probable after commit c07aea3e
      ("mm: add a signature in struct page") because page_is_pfmemalloc()
      is now using low order bit from page->lru.next, which can change
      more often than page->index.
      
      Low order bit should never be set for lru.next (when used as an anchor
      in LRU list), so KCSAN report is mostly a false positive.
      
      Backporting to older kernel versions seems not necessary.
      
      [1]
      BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag
      
      write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0:
      __list_add include/linux/list.h:73 [inline]
      list_add include/linux/list.h:88 [inline]
      lruvec_add_folio include/linux/mm_inline.h:105 [inline]
      lru_add_fn+0x440/0x520 mm/swap.c:228
      folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
      folio_batch_add_and_move mm/swap.c:263 [inline]
      folio_add_lru+0xf1/0x140 mm/swap.c:490
      filemap_add_folio+0xf8/0x150 mm/filemap.c:948
      __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981
      pagecache_get_page+0x26/0x190 mm/folio-compat.c:104
      grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116
      ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988
      generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738
      ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270
      ext4_file_write_iter+0x2e3/0x1210
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x468/0x760 fs/read_write.c:578
      ksys_write+0xe8/0x1a0 fs/read_write.c:631
      __do_sys_write fs/read_write.c:643 [inline]
      __se_sys_write fs/read_write.c:640 [inline]
      __x64_sys_write+0x3e/0x50 fs/read_write.c:640
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1:
      page_is_pfmemalloc include/linux/mm.h:1740 [inline]
      __skb_fill_page_desc include/linux/skbuff.h:2422 [inline]
      skb_fill_page_desc include/linux/skbuff.h:2443 [inline]
      tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018
      do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075
      tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline]
      tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150
      inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
      kernel_sendpage+0x184/0x300 net/socket.c:3561
      sock_sendpage+0x5a/0x70 net/socket.c:1054
      pipe_to_sendpage+0x128/0x160 fs/splice.c:361
      splice_from_pipe_feed fs/splice.c:415 [inline]
      __splice_from_pipe+0x222/0x4d0 fs/splice.c:559
      splice_from_pipe fs/splice.c:594 [inline]
      generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
      do_splice_from fs/splice.c:764 [inline]
      direct_splice_actor+0x80/0xa0 fs/splice.c:931
      splice_direct_to_actor+0x305/0x620 fs/splice.c:886
      do_splice_direct+0xfb/0x180 fs/splice.c:974
      do_sendfile+0x3bf/0x910 fs/read_write.c:1249
      __do_sys_sendfile64 fs/read_write.c:1317 [inline]
      __se_sys_sendfile64 fs/read_write.c:1303 [inline]
      __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0xffffea0004a1d288
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
      
      Fixes: c07aea3e ("mm: add a signature in struct page")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32614006
    • Dan Carpenter's avatar
      tipc: fix shift wrapping bug in map_get() · e2b224ab
      Dan Carpenter authored
      There is a shift wrapping bug in this code so anything thing above
      31 will return false.
      
      Fixes: 35c55c98 ("tipc: add neighbor monitoring framework")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2b224ab
    • Toke Høiland-Jørgensen's avatar
      sch_sfb: Don't assume the skb is still around after enqueueing to child · 9efd2329
      Toke Høiland-Jørgensen authored
      The sch_sfb enqueue() routine assumes the skb is still alive after it has
      been enqueued into a child qdisc, using the data in the skb cb field in the
      increment_qlen() routine after enqueue. However, the skb may in fact have
      been freed, causing a use-after-free in this case. In particular, this
      happens if sch_cake is used as a child of sfb, and the GSO splitting mode
      of CAKE is enabled (in which case the skb will be split into segments and
      the original skb freed).
      
      Fix this by copying the sfb cb data to the stack before enqueueing the skb,
      and using this stack copy in increment_qlen() instead of the skb pointer
      itself.
      
      Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9efd2329
    • Heiner Kallweit's avatar
      Revert "net: phy: meson-gxl: improve link-up behavior" · 7fdc7766
      Heiner Kallweit authored
      This reverts commit 2c87c6f9.
      Meanwhile it turned out that the following commit is the proper
      workaround for the issue that 2c87c6f9 tries to address.
      a3a57bf0 ("net: stmmac: work around sporadic tx issue on link-up")
      It's nor clear why the to be reverted commit helped for one user,
      for others it didn't make a difference.
      
      Fixes: 2c87c6f9 ("net: phy: meson-gxl: improve link-up behavior")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/8deeeddc-6b71-129b-1918-495a12dc11e3@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7fdc7766
  6. 01 Sep, 2022 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 42e66b1c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth, bpf and wireless.
      
        Current release - regressions:
      
         - bpf:
            - fix wrong last sg check in sk_msg_recvmsg()
            - fix kernel BUG in purge_effective_progs()
      
         - mac80211:
            - fix possible leak in ieee80211_tx_control_port()
            - potential NULL dereference in ieee80211_tx_control_port()
      
        Current release - new code bugs:
      
         - nfp: fix the access to management firmware hanging
      
        Previous releases - regressions:
      
         - ip: fix triggering of 'icmp redirect'
      
         - sched: tbf: don't call qdisc_put() while holding tree lock
      
         - bpf: fix corrupted packets for XDP_SHARED_UMEM
      
         - bluetooth: hci_sync: fix suspend performance regression
      
         - micrel: fix probe failure
      
        Previous releases - always broken:
      
         - tcp: make global challenge ack rate limitation per net-ns and
           default disabled
      
         - tg3: fix potential hang-up on system reboot
      
         - mac802154: fix reception for no-daddr packets
      
        Misc:
      
         - r8152: add PID for the lenovo onelink+ dock"
      
      * tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
        net/smc: Remove redundant refcount increase
        Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
        tcp: make global challenge ack rate limitation per net-ns and default disabled
        tcp: annotate data-race around challenge_timestamp
        net: dsa: hellcreek: Print warning only once
        ip: fix triggering of 'icmp redirect'
        sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
        selftests: net: sort .gitignore file
        Documentation: networking: correct possessive "its"
        kcm: fix strp_init() order and cleanup
        mlxbf_gige: compute MDIO period based on i1clk
        ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
        net: lan966x: improve error handle in lan966x_fdma_rx_get_frame()
        nfp: fix the access to management firmware hanging
        net: phy: micrel: Make the GPIO to be non-exclusive
        net: virtio_net: fix notification coalescing comments
        net/sched: fix netdevice reference leaks in attach_default_qdiscs()
        net: sched: tbf: don't call qdisc_put() while holding tree lock
        net: Use u64_stats_fetch_begin_irq() for stats fetch.
        net: dsa: xrs700x: Use irqsave variant for u64 stats update
        ...
      42e66b1c
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · d330076e
      Linus Torvalds authored
      Pull slab fix from Vlastimil Babka:
      
       - A fix from Waiman Long to avoid a theoretical deadlock reported by
         lockdep.
      
      * tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
      d330076e
    • Linus Torvalds's avatar
      Merge tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2880e1a1
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Just handful changes at this time. The only major change is the
        regression fix about the x86 WC-page buffer allocation.
      
        The rest are trivial data-race fixes for ALSA sequencer core, the
        possible out-of-bounds access fixes in the new ALSA control hash code,
        and a few device-specific workarounds and fixes"
      
      * tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5
        ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298
        ALSA: control: Re-order bounds checking in get_ctl_id_hash()
        ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash()
        ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array
        ALSA: seq: Fix data-race at module auto-loading
        ALSA: seq: oss: Fix data-race for max_midi_devs access
        ALSA: memalloc: Revive x86-specific WC page allocations again
      2880e1a1
    • David Howells's avatar
      rxrpc: Remove rxrpc_get_reply_time() which is no longer used · 21457f4a
      David Howells authored
      Remove rxrpc_get_reply_time() as that is no longer used now that the call
      issue time is used instead of the reply time.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      21457f4a
    • David Howells's avatar
      afs: Use the operation issue time instead of the reply time for callbacks · 7903192c
      David Howells authored
      rxrpc and kafs between them try to use the receive timestamp on the first
      data packet (ie. the one with sequence number 1) as a base from which to
      calculate the time at which callback promise and lock expiration occurs.
      
      However, we don't know how long it took for the server to send us the reply
      from it having completed the basic part of the operation - it might then,
      for instance, have to send a bunch of a callback breaks, depending on the
      particular operation.
      
      Fix this by using the time at which the operation is issued on the client
      as a base instead.  That should never be longer than the server's idea of
      the expiry time.
      
      Fixes: 78107055 ("afs: Fix calculation of callback expiry time")
      Fixes: 2070a3e4 ("rxrpc: Allow the reply time to be obtained on a client call")
      Suggested-by: default avatarJeffrey E Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7903192c
    • David Howells's avatar
      rxrpc: Fix calc of resend age · 214a9dc7
      David Howells authored
      Fix the calculation of the resend age to add a microsecond value as
      microseconds, not nanoseconds.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      214a9dc7