1. 01 Mar, 2024 19 commits
  2. 29 Feb, 2024 21 commits
    • Filipe Manana's avatar
      btrfs: fix double free of anonymous device after snapshot creation failure · e2b54eaf
      Filipe Manana authored
      When creating a snapshot we may do a double free of an anonymous device
      in case there's an error committing the transaction. The second free may
      result in freeing an anonymous device number that was allocated by some
      other subsystem in the kernel or another btrfs filesystem.
      
      The steps that lead to this:
      
      1) At ioctl.c:create_snapshot() we allocate an anonymous device number
         and assign it to pending_snapshot->anon_dev;
      
      2) Then we call btrfs_commit_transaction() and end up at
         transaction.c:create_pending_snapshot();
      
      3) There we call btrfs_get_new_fs_root() and pass it the anonymous device
         number stored in pending_snapshot->anon_dev;
      
      4) btrfs_get_new_fs_root() frees that anonymous device number because
         btrfs_lookup_fs_root() returned a root - someone else did a lookup
         of the new root already, which could some task doing backref walking;
      
      5) After that some error happens in the transaction commit path, and at
         ioctl.c:create_snapshot() we jump to the 'fail' label, and after
         that we free again the same anonymous device number, which in the
         meanwhile may have been reallocated somewhere else, because
         pending_snapshot->anon_dev still has the same value as in step 1.
      
      Recently syzbot ran into this and reported the following trace:
      
        ------------[ cut here ]------------
        ida_free called for id=51 which is not allocated.
        WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525
        Modules linked in:
        CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
        RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525
        Code: 10 42 80 3c 28 (...)
        RSP: 0018:ffffc90015a67300 EFLAGS: 00010246
        RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000
        RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000
        RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4
        R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246
        R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246
        FS:  00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0
        Call Trace:
         <TASK>
         btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346
         create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837
         create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931
         btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404
         create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848
         btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998
         btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044
         __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306
         btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393
         btrfs_ioctl+0xa74/0xd40
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:871 [inline]
         __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857
         do_syscall_64+0xfb/0x240
         entry_SYSCALL_64_after_hwframe+0x6f/0x77
        RIP: 0033:0x7fca3e67dda9
        Code: 28 00 00 00 (...)
        RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9
        RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003
        RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658
         </TASK>
      
      Where we get an explicit message where we attempt to free an anonymous
      device number that is not currently allocated. It happens in a different
      code path from the example below, at btrfs_get_root_ref(), so this change
      may not fix the case triggered by syzbot.
      
      To fix at least the code path from the example above, change
      btrfs_get_root_ref() and its callers to receive a dev_t pointer argument
      for the anonymous device number, so that in case it frees the number, it
      also resets it to 0, so that up in the call chain we don't attempt to do
      the double free.
      
      CC: stable@vger.kernel.org # 5.10+
      Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/
      Fixes: e03ee2fe ("btrfs: do not ASSERT() if the newly created subvolume already got read")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e2b54eaf
    • Filipe Manana's avatar
      btrfs: ensure fiemap doesn't race with writes when FIEMAP_FLAG_SYNC is given · 418b0902
      Filipe Manana authored
      When FIEMAP_FLAG_SYNC is given to fiemap the expectation is that that
      are no concurrent writes and we get a stable view of the inode's extent
      layout.
      
      When the flag is given we flush all IO (and wait for ordered extents to
      complete) and then lock the inode in shared mode, however that leaves open
      the possibility that a write might happen right after the flushing and
      before locking the inode. So fix this by flushing again after locking the
      inode - we leave the initial flushing before locking the inode to avoid
      holding the lock and blocking other RO operations while waiting for IO
      and ordered extents to complete. The second flushing while holding the
      inode's lock will most of the time do nothing or very little since the
      time window for new writes to have happened is small.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      418b0902
    • Filipe Manana's avatar
      btrfs: fix race between ordered extent completion and fiemap · a1a4a9ca
      Filipe Manana authored
      For fiemap we recently stopped locking the target extent range for the
      whole duration of the fiemap call, in order to avoid a deadlock in a
      scenario where the fiemap buffer happens to be a memory mapped range of
      the same file. This use case is very unlikely to be useful in practice but
      it may be triggered by fuzz testing (syzbot, etc).
      
      However by not locking the target extent range for the whole duration of
      the fiemap call we can race with an ordered extent. This happens like
      this:
      
      1) The fiemap task finishes processing a file extent item that covers
         the file range [512K, 1M[, and that file extent item is the last item
         in the leaf currently being processed;
      
      2) And ordered extent for the file range [768K, 2M[, in COW mode,
         completes (btrfs_finish_one_ordered()) and the file extent item
         covering the range [512K, 1M[ is trimmed to cover the range
         [512K, 768K[ and then a new file extent item for the range [768K, 2M[
         is inserted in the inode's subvolume tree;
      
      3) The fiemap task calls fiemap_next_leaf_item(), which then calls
         btrfs_next_leaf() to find the next leaf / item. This finds that the
         the next key following the one we previously processed (its type is
         BTRFS_EXTENT_DATA_KEY and its offset is 512K), is the key corresponding
         to the new file extent item inserted by the ordered extent, which has
         a type of BTRFS_EXTENT_DATA_KEY and an offset of 768K;
      
      4) Later the fiemap code ends up at emit_fiemap_extent() and triggers
         the warning:
      
            if (cache->offset + cache->len > offset) {
                     WARN_ON(1);
                     return -EINVAL;
            }
      
         Since we get 1M > 768K, because the previously emitted entry for the
         old extent covering the file range [512K, 1M[ ends at an offset that
         is greater than the new extent's start offset (768K). This makes fiemap
         fail with -EINVAL besides triggering the warning that produces a stack
         trace like the following:
      
           [1621.677651] ------------[ cut here ]------------
           [1621.677656] WARNING: CPU: 1 PID: 204366 at fs/btrfs/extent_io.c:2492 emit_fiemap_extent+0x84/0x90 [btrfs]
           [1621.677899] Modules linked in: btrfs blake2b_generic (...)
           [1621.677951] CPU: 1 PID: 204366 Comm: pool Not tainted 6.8.0-rc5-btrfs-next-151+ #1
           [1621.677954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
           [1621.677956] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs]
           [1621.678033] Code: 2b 4c 89 63 (...)
           [1621.678035] RSP: 0018:ffffab16089ffd20 EFLAGS: 00010206
           [1621.678037] RAX: 00000000004fa000 RBX: ffffab16089ffe08 RCX: 0000000000009000
           [1621.678039] RDX: 00000000004f9000 RSI: 00000000004f1000 RDI: ffffab16089ffe90
           [1621.678040] RBP: 00000000004f9000 R08: 0000000000001000 R09: 0000000000000000
           [1621.678041] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000041d78000
           [1621.678043] R13: 0000000000001000 R14: 0000000000000000 R15: ffff9434f0b17850
           [1621.678044] FS:  00007fa6e20006c0(0000) GS:ffff943bdfa40000(0000) knlGS:0000000000000000
           [1621.678046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
           [1621.678048] CR2: 00007fa6b0801000 CR3: 000000012d404002 CR4: 0000000000370ef0
           [1621.678053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
           [1621.678055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
           [1621.678056] Call Trace:
           [1621.678074]  <TASK>
           [1621.678076]  ? __warn+0x80/0x130
           [1621.678082]  ? emit_fiemap_extent+0x84/0x90 [btrfs]
           [1621.678159]  ? report_bug+0x1f4/0x200
           [1621.678164]  ? handle_bug+0x42/0x70
           [1621.678167]  ? exc_invalid_op+0x14/0x70
           [1621.678170]  ? asm_exc_invalid_op+0x16/0x20
           [1621.678178]  ? emit_fiemap_extent+0x84/0x90 [btrfs]
           [1621.678253]  extent_fiemap+0x766/0xa30 [btrfs]
           [1621.678339]  btrfs_fiemap+0x45/0x80 [btrfs]
           [1621.678420]  do_vfs_ioctl+0x1e4/0x870
           [1621.678431]  __x64_sys_ioctl+0x6a/0xc0
           [1621.678434]  do_syscall_64+0x52/0x120
           [1621.678445]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      
      There's also another case where before calling btrfs_next_leaf() we are
      processing a hole or a prealloc extent and we had several delalloc ranges
      within that hole or prealloc extent. In that case if the ordered extents
      complete before we find the next key, we may end up finding an extent item
      with an offset smaller than (or equals to) the offset in cache->offset.
      
      So fix this by changing emit_fiemap_extent() to address these three
      scenarios like this:
      
      1) For the first case, steps listed above, adjust the length of the
         previously cached extent so that it does not overlap with the current
         extent, emit the previous one and cache the current file extent item;
      
      2) For the second case where he had a hole or prealloc extent with
         multiple delalloc ranges inside the hole or prealloc extent's range,
         and the current file extent item has an offset that matches the offset
         in the fiemap cache, just discard what we have in the fiemap cache and
         assign the current file extent item to the cache, since it's more up
         to date;
      
      3) For the third case where he had a hole or prealloc extent with
         multiple delalloc ranges inside the hole or prealloc extent's range
         and the offset of the file extent item we just found is smaller than
         what we have in the cache, just skip the current file extent item
         if its range end at or behind the cached extent's end, because we may
         have emitted (to the fiemap user space buffer) delalloc ranges that
         overlap with the current file extent item's range. If the file extent
         item's range goes beyond the end offset of the cached extent, just
         emit the cached extent and cache a subrange of the file extent item,
         that goes from the end offset of the cached extent to the end offset
         of the file extent item.
      
      Dealing with those cases in those ways makes everything consistent by
      reflecting the current state of file extent items in the btree and
      without emitting extents that have overlapping ranges (which would be
      confusing and violating expectations).
      
      This issue could be triggered often with test case generic/561, and was
      also hit and reported by Wang Yugui.
      Reported-by: default avatarWang Yugui <wangyugui@e16-tech.com>
      Link: https://lore.kernel.org/linux-btrfs/20240223104619.701F.409509F4@e16-tech.com/
      Fixes: b0ad381f ("btrfs: fix deadlock with fiemap and extent locking")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a1a4a9ca
    • Linus Torvalds's avatar
      Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 87adedeb
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, WiFi and netfilter.
      
        We have one outstanding issue with the stmmac driver, which may be a
        LOCKDEP false positive, not a blocker.
      
        Current release - regressions:
      
         - netfilter: nf_tables: re-allow NFPROTO_INET in
           nft_(match/target)_validate()
      
         - eth: ionic: fix error handling in PCI reset code
      
        Current release - new code bugs:
      
         - eth: stmmac: complete meta data only when enabled, fix null-deref
      
         - kunit: fix again checksum tests on big endian CPUs
      
        Previous releases - regressions:
      
         - veth: try harder when allocating queue memory
      
         - Bluetooth:
            - hci_bcm4377: do not mark valid bd_addr as invalid
            - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
      
        Previous releases - always broken:
      
         - info leak in __skb_datagram_iter() on netlink socket
      
         - mptcp:
            - map v4 address to v6 when destroying subflow
            - fix potential wake-up event loss due to sndbuf auto-tuning
            - fix double-free on socket dismantle
      
         - wifi: nl80211: reject iftype change with mesh ID change
      
         - fix small out-of-bound read when validating netlink be16/32 types
      
         - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
      
         - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
      
         - ip_tunnel: prevent perpetual headroom growth with huge number of
           tunnels on top of each other
      
         - mctp: fix skb leaks on error paths of mctp_local_output()
      
         - eth: ice: fixes for DPLL state reporting
      
         - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
      
         - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
           tree"
      
      * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
        dpll: fix build failure due to rcu_dereference_check() on unknown type
        kunit: Fix again checksum tests on big endian CPUs
        tls: fix use-after-free on failed backlog decryption
        tls: separate no-async decryption request handling from async
        tls: fix peeking with sync+async decryption
        tls: decrement decrypt_pending if no async completion will be called
        gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
        net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
        igb: extend PTP timestamp adjustments to i211
        rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
        tools: ynl: fix handling of multiple mcast groups
        selftests: netfilter: add bridge conntrack + multicast test case
        netfilter: bridge: confirm multicast packets before passing them up the stack
        netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
        Bluetooth: qca: Fix triggering coredump implementation
        Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
        Bluetooth: qca: Fix wrong event type for patch config command
        Bluetooth: Enforce validation on max value of connection interval
        Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
        Bluetooth: mgmt: Fix limited discoverable off timeout
        ...
      87adedeb
    • Linus Torvalds's avatar
      Merge tag 'landlock-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux · d4f76f80
      Linus Torvalds authored
      Pull Landlock fix from Mickaël Salaün:
       "Fix a potential issue when handling inodes with inconsistent
        properties"
      
      * tag 'landlock-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
        landlock: Fix asymmetric private inodes referring
      d4f76f80
    • Eric Dumazet's avatar
      dpll: fix build failure due to rcu_dereference_check() on unknown type · 640f41ed
      Eric Dumazet authored
      Tasmiya reports that their compiler complains that we deref
      a pointer to unknown type with rcu_dereference_rtnl():
      
      include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’
      
      Unclear what compiler it is, at the moment, and we can't report
      but since DPLL can't be a module - move the code from the header
      into the source file.
      
      Fixes: 0d60d8df ("dpll: rely on rcu for netdev_dpll_pin()")
      Reported-by: default avatarTasmiya Nalatwad <tasmiya@linux.vnet.ibm.com>
      Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      640f41ed
    • Christophe Leroy's avatar
      kunit: Fix again checksum tests on big endian CPUs · 3d6423ef
      Christophe Leroy authored
      Commit b38460bc ("kunit: Fix checksum tests on big endian CPUs")
      fixed endianness issues with kunit checksum tests, but then
      commit 6f4c45cb ("kunit: Add tests for csum_ipv6_magic and
      ip_fast_csum") introduced new issues on big endian CPUs. Those issues
      are once again reflected by the warnings reported by sparse.
      
      So, fix them with the same approach, perform proper conversion in
      order to support both little and big endian CPUs. Once the conversions
      are properly done and the right types used, the sparse warnings are
      cleared as well.
      Reported-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Fixes: 6f4c45cb ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Tested-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Link: https://lore.kernel.org/r/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d6423ef
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 244b96c2
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - mgmt: Fix limited discoverable off timeout
       - hci_qca: Set BDA quirk bit if fwnode exists in DT
       - hci_bcm4377: do not mark valid bd_addr as invalid
       - hci_sync: Check the correct flag before starting a scan
       - Enforce validation on max value of connection interval
       - hci_sync: Fix accept_list when attempting to suspend
       - hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
       - Avoid potential use-after-free in hci_error_reset
       - rfcomm: Fix null-ptr-deref in rfcomm_check_security
       - hci_event: Fix wrongly recorded wakeup BD_ADDR
       - qca: Fix wrong event type for patch config command
       - qca: Fix triggering coredump implementation
      
      * tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: qca: Fix triggering coredump implementation
        Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
        Bluetooth: qca: Fix wrong event type for patch config command
        Bluetooth: Enforce validation on max value of connection interval
        Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
        Bluetooth: mgmt: Fix limited discoverable off timeout
        Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
        Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
        Bluetooth: hci_sync: Fix accept_list when attempting to suspend
        Bluetooth: Avoid potential use-after-free in hci_error_reset
        Bluetooth: hci_sync: Check the correct flag before starting a scan
        Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
      ====================
      
      Link: https://lore.kernel.org/r/20240228145644.2269088-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      244b96c2
    • Jakub Kicinski's avatar
      Merge branch 'tls-a-few-more-fixes-for-async-decrypt' · 8f5afe41
      Jakub Kicinski authored
      Sabrina Dubroca says:
      
      ====================
      tls: a few more fixes for async decrypt
      
      The previous patchset [1] took care of "full async". This adds a few
      fixes for cases where only part of the crypto operations go the async
      route, found by extending my previous debug patch [2] to do N
      synchronous operations followed by M asynchronous ops (with N and M
      configurable).
      
      [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=823784&state=*
      [2] https://lore.kernel.org/all/9d664093b1bf7f47497b2c40b3a085b45f3274a2.1694021240.git.sd@queasysnail.net/
      ====================
      
      Link: https://lore.kernel.org/r/cover.1709132643.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f5afe41
    • Sabrina Dubroca's avatar
      tls: fix use-after-free on failed backlog decryption · 13114dc5
      Sabrina Dubroca authored
      When the decrypt request goes to the backlog and crypto_aead_decrypt
      returns -EBUSY, tls_do_decryption will wait until all async
      decryptions have completed. If one of them fails, tls_do_decryption
      will return -EBADMSG and tls_decrypt_sg jumps to the error path,
      releasing all the pages. But the pages have been passed to the async
      callback, and have already been released by tls_decrypt_done.
      
      The only true async case is when crypto_aead_decrypt returns
       -EINPROGRESS. With -EBUSY, we already waited so we can tell
      tls_sw_recvmsg that the data is available for immediate copy, but we
      need to notify tls_decrypt_sg (via the new ->async_done flag) that the
      memory has already been released.
      
      Fixes: 85905414 ("net: tls: handle backlogging of crypto requests")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.1709132643.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13114dc5
    • Sabrina Dubroca's avatar
      tls: separate no-async decryption request handling from async · 41532b78
      Sabrina Dubroca authored
      If we're not doing async, the handling is much simpler. There's no
      reference counting, we just need to wait for the completion to wake us
      up and return its result.
      
      We should preferably also use a separate crypto_wait. I'm not seeing a
      UAF as I did in the past, I think aec79619 ("tls: fix race between
      async notify and socket close") took care of it.
      
      This will make the next fix easier.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.1709132643.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41532b78
    • Sabrina Dubroca's avatar
      tls: fix peeking with sync+async decryption · 6caaf104
      Sabrina Dubroca authored
      If we peek from 2 records with a currently empty rx_list, and the
      first record is decrypted synchronously but the second record is
      decrypted async, the following happens:
        1. decrypt record 1 (sync)
        2. copy from record 1 to the userspace's msg
        3. queue the decrypted record to rx_list for future read(!PEEK)
        4. decrypt record 2 (async)
        5. queue record 2 to rx_list
        6. call process_rx_list to copy data from the 2nd record
      
      We currently pass copied=0 as skip offset to process_rx_list, so we
      end up copying once again from the first record. We should skip over
      the data we've already copied.
      
      Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.1709132643.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6caaf104
    • Sabrina Dubroca's avatar
      tls: decrement decrypt_pending if no async completion will be called · f7fa16d4
      Sabrina Dubroca authored
      With mixed sync/async decryption, or failures of crypto_aead_decrypt,
      we increment decrypt_pending but we never do the corresponding
      decrement since tls_decrypt_done will not be called. In this case, we
      should decrement decrypt_pending immediately to avoid getting stuck.
      
      For example, the prequeue prequeue test gets stuck with mixed
      modes (one async decrypt + one sync decrypt).
      
      Fixes: 94524d8f ("net/tls: Add support for async decryption of tls records")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7fa16d4
    • Alexander Ofitserov's avatar
      gtp: fix use-after-free and null-ptr-deref in gtp_newlink() · 616d82c3
      Alexander Ofitserov authored
      The gtp_link_ops operations structure for the subsystem must be
      registered after registering the gtp_net_ops pernet operations structure.
      
      Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
      
      [ 1010.702740] gtp: GTP module unloaded
      [ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI
      [ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      [ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1
      [ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014
      [ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp]
      [ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00
      [ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203
      [ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000
      [ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282
      [ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000
      [ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80
      [ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400
      [ 1010.715953] FS:  00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000
      [ 1010.715958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0
      [ 1010.715968] PKRU: 55555554
      [ 1010.715972] Call Trace:
      [ 1010.715985]  ? __die_body.cold+0x1a/0x1f
      [ 1010.715995]  ? die_addr+0x43/0x70
      [ 1010.716002]  ? exc_general_protection+0x199/0x2f0
      [ 1010.716016]  ? asm_exc_general_protection+0x1e/0x30
      [ 1010.716026]  ? gtp_newlink+0x4d7/0x9c0 [gtp]
      [ 1010.716034]  ? gtp_net_exit+0x150/0x150 [gtp]
      [ 1010.716042]  __rtnl_newlink+0x1063/0x1700
      [ 1010.716051]  ? rtnl_setlink+0x3c0/0x3c0
      [ 1010.716063]  ? is_bpf_text_address+0xc0/0x1f0
      [ 1010.716070]  ? kernel_text_address.part.0+0xbb/0xd0
      [ 1010.716076]  ? __kernel_text_address+0x56/0xa0
      [ 1010.716084]  ? unwind_get_return_address+0x5a/0xa0
      [ 1010.716091]  ? create_prof_cpu_mask+0x30/0x30
      [ 1010.716098]  ? arch_stack_walk+0x9e/0xf0
      [ 1010.716106]  ? stack_trace_save+0x91/0xd0
      [ 1010.716113]  ? stack_trace_consume_entry+0x170/0x170
      [ 1010.716121]  ? __lock_acquire+0x15c5/0x5380
      [ 1010.716139]  ? mark_held_locks+0x9e/0xe0
      [ 1010.716148]  ? kmem_cache_alloc_trace+0x35f/0x3c0
      [ 1010.716155]  ? __rtnl_newlink+0x1700/0x1700
      [ 1010.716160]  rtnl_newlink+0x69/0xa0
      [ 1010.716166]  rtnetlink_rcv_msg+0x43b/0xc50
      [ 1010.716172]  ? rtnl_fdb_dump+0x9f0/0x9f0
      [ 1010.716179]  ? lock_acquire+0x1fe/0x560
      [ 1010.716188]  ? netlink_deliver_tap+0x12f/0xd50
      [ 1010.716196]  netlink_rcv_skb+0x14d/0x440
      [ 1010.716202]  ? rtnl_fdb_dump+0x9f0/0x9f0
      [ 1010.716208]  ? netlink_ack+0xab0/0xab0
      [ 1010.716213]  ? netlink_deliver_tap+0x202/0xd50
      [ 1010.716220]  ? netlink_deliver_tap+0x218/0xd50
      [ 1010.716226]  ? __virt_addr_valid+0x30b/0x590
      [ 1010.716233]  netlink_unicast+0x54b/0x800
      [ 1010.716240]  ? netlink_attachskb+0x870/0x870
      [ 1010.716248]  ? __check_object_size+0x2de/0x3b0
      [ 1010.716254]  netlink_sendmsg+0x938/0xe40
      [ 1010.716261]  ? netlink_unicast+0x800/0x800
      [ 1010.716269]  ? __import_iovec+0x292/0x510
      [ 1010.716276]  ? netlink_unicast+0x800/0x800
      [ 1010.716284]  __sock_sendmsg+0x159/0x190
      [ 1010.716290]  ____sys_sendmsg+0x712/0x880
      [ 1010.716297]  ? sock_write_iter+0x3d0/0x3d0
      [ 1010.716304]  ? __ia32_sys_recvmmsg+0x270/0x270
      [ 1010.716309]  ? lock_acquire+0x1fe/0x560
      [ 1010.716315]  ? drain_array_locked+0x90/0x90
      [ 1010.716324]  ___sys_sendmsg+0xf8/0x170
      [ 1010.716331]  ? sendmsg_copy_msghdr+0x170/0x170
      [ 1010.716337]  ? lockdep_init_map_type+0x2c7/0x860
      [ 1010.716343]  ? lockdep_hardirqs_on_prepare+0x430/0x430
      [ 1010.716350]  ? debug_mutex_init+0x33/0x70
      [ 1010.716360]  ? percpu_counter_add_batch+0x8b/0x140
      [ 1010.716367]  ? lock_acquire+0x1fe/0x560
      [ 1010.716373]  ? find_held_lock+0x2c/0x110
      [ 1010.716384]  ? __fd_install+0x1b6/0x6f0
      [ 1010.716389]  ? lock_downgrade+0x810/0x810
      [ 1010.716396]  ? __fget_light+0x222/0x290
      [ 1010.716403]  __sys_sendmsg+0xea/0x1b0
      [ 1010.716409]  ? __sys_sendmsg_sock+0x40/0x40
      [ 1010.716419]  ? lockdep_hardirqs_on_prepare+0x2b3/0x430
      [ 1010.716425]  ? syscall_enter_from_user_mode+0x1d/0x60
      [ 1010.716432]  do_syscall_64+0x30/0x40
      [ 1010.716438]  entry_SYSCALL_64_after_hwframe+0x62/0xc7
      [ 1010.716444] RIP: 0033:0x7fd1508cbd49
      [ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48
      [ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      [ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49
      [ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006
      [ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360
      [ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0
      [ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp
      [ 1010.716674]  virtio_ring intel_gtt virtio [last unloaded: gtp]
      [ 1010.716693] ---[ end trace 04990a4ce61e174b ]---
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Ofitserov <oficerovas@altlinux.org>
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      616d82c3
    • Priyanka Dandamudi's avatar
      drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace · 8188cae3
      Priyanka Dandamudi authored
      Add move_lacks_source detail to xe_bo_move trace to make it readable
      that is to check if it is migrate clear or migrate copy.
      
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarPriyanka Dandamudi <priyanka.dandamudi@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Fixes: a09946a9 ("drm/xe/xe_bo_move: Enhance xe_bo_move trace")
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240221101950.1019312-1-priyanka.dandamudi@intel.com
      (cherry picked from commit 8034f6b0)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      8188cae3
    • Paolo Abeni's avatar
      Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · b611b776
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
      
      Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
      packets.
      
      There is a day 0 bug in br_netfilter when used with connection tracking.
      
      Conntrack assumes that an nf_conn structure that is not yet added to
      hash table ("unconfirmed"), is only visible by the current cpu that is
      processing the sk_buff.
      
      For bridge this isn't true, sk_buff can get cloned in between, and
      clones can be processed in parallel on different cpu.
      
      This patch disables NAT and conntrack helpers for multicast packets.
      
      Patch #3 adds a selftest to cover for the br_netfilter bug.
      
      netfilter pull request 24-02-29
      
      * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: netfilter: add bridge conntrack + multicast test case
        netfilter: bridge: confirm multicast packets before passing them up the stack
        netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
      ====================
      
      Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b611b776
    • Lukasz Majewski's avatar
      net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames · 51dd4ee0
      Lukasz Majewski authored
      Current HSR implementation uses following supervisory frame (even for
      HSRv1 the HSR tag is not is not present):
      
      00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01
      00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94
      00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000030: 00 00 00 00 00 00 00 00 00 00 00 00
      
      The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
      when offset for skb_pull() is calculated.
      This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
      already have 'struct hsr_sup_tag' defined in them, so there is no need
      for adding extra two bytes.
      
      This code was working correctly as with no RedBox support, the check for
      HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
      zeroed padded bytes for minimal packet size.
      
      Fixes: eafaa88b ("net: hsr: Add support for redbox supervision frames")
      Signed-off-by: default avatarLukasz Majewski <lukma@denx.de>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      51dd4ee0
    • Mika Kuoppala's avatar
      drm/xe: Deny unbinds if uapi ufence pending · 785f4cc0
      Mika Kuoppala authored
      If user fence was provided for MAP in vm_bind_ioctl
      and it has still not been signalled, deny UNMAP of said
      vma with EBUSY as long as unsignalled fence exists.
      
      This guarantees that MAP vs UNMAP sequences won't
      escape under the radar if we ever want to track the
      client's state wrt to completed and accessible MAPs.
      By means of intercepting the ufence release signalling.
      
      v2: find ufence with num_fences > 1 (Matt)
      v3: careful on clearing vma ufence (Matt)
      
      Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1159
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240215181152.450082-3-mika.kuoppala@linux.intel.com
      (cherry picked from commit 158900ad)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      785f4cc0
    • Mika Kuoppala's avatar
      drm/xe: Expose user fence from xe_sync_entry · 86b3cd6d
      Mika Kuoppala authored
      By allowing getting reference to user fence, we can
      control the lifetime outside of sync entries.
      
      This is needed to allow vma to track the associated
      user fence that was provided with bind ioctl.
      
      v2: xe_user_fence can be kept opaque (Jani, Matt)
      v3: indent fix (Matt)
      
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240215181152.450082-2-mika.kuoppala@linux.intel.com
      (cherry picked from commit 977e5b82)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      86b3cd6d
    • Lucas De Marchi's avatar
      drm/xe: Use pointers in trace events · 4ca5c829
      Lucas De Marchi authored
      Commit a0df2cc8 ("drm/xe/xe_bo_move: Enhance xe_bo_move trace")
      inadvertently reverted commit 8d038f49 ("drm/xe: Fix cast on trace
      variable"), breaking the build on 32bits.
      
      As noted by Ville, there's no point in converting the pointers to u64
      and add casts everywhere. In fact, it's better to just use %p and let
      the address be hashed. Convert all the cases in xe_trace.h to use
      pointers.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
      Cc: Oak Zeng <oak.zeng@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240222144125.2862546-1-lucas.demarchi@intel.com
      (cherry picked from commit 7a975748)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      4ca5c829
    • Priyanka Dandamudi's avatar
      drm/xe/xe_bo_move: Enhance xe_bo_move trace · a09946a9
      Priyanka Dandamudi authored
      Enhanced xe_bo_move trace to be more readable.
      It will help to show the migration details.
      Src and dst details.
      
      v2: Modify trace_xe_bo_move(), it takes the integer mem_type
      rather than a string.
      Make mem_type_to_name() extern, it will be used by trace.(Thomas)
      
      v3: Move mem_type_to_name() to xe_bo.[ch] (Thomas, Matt)
      
      v4: Add device details to reduce ambiquity related to vram0/vram1. (Oak)
      
      v5: Rename mem_type_to_name to xe_mem_type_to_name. (Thomas)
      
      v6: Optimised code to use xe_bo_device(__entry->bo). (Thomas)
      
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Oak Zeng <oak.zeng@intel.com>
      Cc: Kempczynski Zbigniew <Zbigniew.Kempczynski@intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Brian Welty <brian.welty@intel.com>
      Signed-off-by: default avatarPriyanka Dandamudi <priyanka.dandamudi@intel.com>
      Reviewed-by: default avatarOak Zeng <oak.zeng@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240220044748.948496-1-priyanka.dandamudi@intel.com
      (cherry picked from commit a0df2cc8)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      a09946a9