- 14 Aug, 2024 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds authored
Pull selinux fixes from Paul Moore: - Fix a xperms counting problem where we adding to the xperms count even if we failed to add the xperm. - Propogate errors from avc_add_xperms_decision() back to the caller so that we can trigger the proper cleanup and error handling. - Revert our use of vma_is_initial_heap() in favor of our older logic as vma_is_initial_heap() doesn't correctly handle the no-heap case and it is causing issues with the SELinux process/execheap access control. While the older SELinux logic may not be perfect, it restores the expected user visible behavior. Hopefully we will be able to resolve the problem with the vma_is_initial_heap() macro with the mm folks, but we need to fix this in the meantime. * tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: revert our use of vma_is_initial_heap() selinux: add the processing of the failure of avc_add_xperms_decision() selinux: fix potential counting error in avc_add_xperms_decision()
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull vfs fixes from Christian Brauner: "VFS: - Fix the name of file lease slab cache. When file leases were split out of file locks the name of the file lock slab cache was used for the file leases slab cache as well. - Fix a type in take_fd() helper. - Fix infinite directory iteration for stable offsets in tmpfs. - When the icache is pruned all reclaimable inodes are marked with I_FREEING and other processes that try to lookup such inodes will block. But some filesystems like ext4 can trigger lookups in their inode evict callback causing deadlocks. Ext4 does such lookups if the ea_inode feature is used whereby a separate inode may be used to store xattrs. Introduce I_LRU_ISOLATING which pins the inode while its pages are reclaimed. This avoids inode deletion during inode_lru_isolate() avoiding the deadlock and evict is made to wait until I_LRU_ISOLATING is done. netfs: - Fault in smaller chunks for non-large folio mappings for filesystems that haven't been converted to large folios yet. - Fix the CONFIG_NETFS_DEBUG config option. The config option was renamed a short while ago and that introduced two minor issues. First, it depended on CONFIG_NETFS whereas it wants to depend on CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter does. Second, the documentation for the config option wasn't fixed up. - Revert the removal of the PG_private_2 writeback flag as ceph is using it and fix how that flag is handled in netfs. - Fix DIO reads on 9p. A program watching a file on a 9p mount wouldn't see any changes in the size of the file being exported by the server if the file was changed directly in the source filesystem. Fix this by attempting to read the full size specified when a DIO read is requested. - Fix a NULL pointer dereference bug due to a data race where a cachefiles cookies was retired even though it was still in use. Check the cookie's n_accesses counter before discarding it. nsfs: - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as the kernel is writing to userspace. pidfs: - Prevent the creation of pidfds for kthreads until we have a use-case for it and we know the semantics we want. It also confuses userspace why they can get pidfds for kthreads. squashfs: - Fix an unitialized value bug reported by KMSAN caused by a corrupted symbolic link size read from disk. Check that the symbolic link size is not larger than expected" * tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Squashfs: sanity check symbolic link size 9p: Fix DIO read through netfs vfs: Don't evict inode under the inode lru traversing context netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag" file: fix typo in take_fd() comment pidfd: prevent creation of pidfds for kthreads netfs: clean up after renaming FSCACHE_DEBUG config libfs: fix infinite directory reads for offset dir nsfs: fix ioctl declaration fs/netfs/fscache_cookie: add missing "n_accesses" check filelock: fix name of file_lease slab cache netfs: Fault in smaller chunks for non-large folio mappings
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds authored
Pull bpf fixes from Alexei Starovoitov: - Fix bpftrace regression from Kyle Huey. Tracing bpf prog was called with perf_event input arguments causing bpftrace produce garbage output. - Fix verifier crash in stacksafe() from Yonghong Song. Daniel Hodges reported verifier crash when playing with sched-ext. The stack depth in the known verifier state was larger than stack depth in being explored state causing out-of-bounds access. - Fix update of freplace prog in prog_array from Leon Hwang. freplace prog type wasn't recognized correctly. * tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf/bpf: Don't call bpf_overflow_handler() for tracing events selftests/bpf: Add a test to verify previous stacksafe() fix bpf: Fix a kernel verifier crash in stacksafe() bpf: Fix updating attached freplace prog in prog_array map
-
- 13 Aug, 2024 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull execve fixes from Kees Cook: - binfmt_flat: Fix corruption when not offsetting data start - exec: Fix ToCToU between perm check and set-uid/gid usage * tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Fix ToCToU between perm check and set-uid/gid usage binfmt_flat: Fix corruption when not offsetting data start
-
Kees Cook authored
When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti@google.com> Tested-by: Marco Vanotti <mvanotti@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org>
-
Kyle Huey authored
The regressing commit is new in 6.10. It assumed that anytime event->prog is set bpf_overflow_handler() should be invoked to execute the attached bpf program. This assumption is false for tracing events, and as a result the regressing commit broke bpftrace by invoking the bpf handler with garbage inputs on overflow. Prior to the regression the overflow handlers formed a chain (of length 0, 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() (the tracing case) did not. Both set event->prog. The chain of overflow handlers was replaced by a single overflow handler slot and a fixed call to bpf_overflow_handler() when appropriate. This modifies the condition there to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the previous behavior and fixing bpftrace. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Reported-by: Joe Damato <jdamato@fastly.com> Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/ Fixes: f11f10bf ("perf/bpf: Call BPF handler directly, not through overflow machinery") Cc: stable@vger.kernel.org Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
git://git.samba.org/ksmbdLinus Torvalds authored
Pull smb server fixes from Steve French: "Two smb3 server fixes for access denied problem on share path checks" * tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: override fsids for smb2_query_info() ksmbd: override fsids for share path check
-
Phillip Lougher authored
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug. This is caused by an uninitialised page, which is ultimately caused by a corrupted symbolic link size read from disk. The reason why the corrupted symlink size causes an uninitialised page is due to the following sequence of events: 1. squashfs_read_inode() is called to read the symbolic link from disk. This assigns the corrupted value 3875536935 to inode->i_size. 2. Later squashfs_symlink_read_folio() is called, which assigns this corrupted value to the length variable, which being a signed int, overflows producing a negative number. 3. The following loop that fills in the page contents checks that the copied bytes is less than length, which being negative means the loop is skipped, producing an uninitialised page. This patch adds a sanity check which checks that the symbolic link size is not larger than expected. -- Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Link: https://lore.kernel.org/r/20240811232821.13903-1-phillip@squashfs.org.ukReported-by: Lizhi Xu <lizhi.xu@windriver.com> Reported-by: syzbot+24ac24ff58dc5b0d26b9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/ V2: fix spelling mistake. Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Dominique Martinet authored
If a program is watching a file on a 9p mount, it won't see any change in size if the file being exported by the server is changed directly in the source filesystem, presumably because 9p doesn't have change notifications, and because netfs skips the reads if the file is empty. Fix this by attempting to read the full size specified when a DIO read is requested (such as when 9p is operating in unbuffered mode) and dealing with a short read if the EOF was less than the expected read. To make this work, filesystems using netfslib must not set NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF. I don't want to mandatorily clear this flag in netfslib for DIO because, say, ceph might make a read from an object that is not completely filled, but does not reside at the end of file - and so we need to clear the excess. This can be tested by watching an empty file over 9p within a VM (such as in the ktest framework): while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo then writing something into the empty file. The watcher should immediately display the file content and break out of the loop. Without this fix, it remains in the loop indefinitely. Fixes: 80105ed2 ("9p: Use netfslib read/write_iter") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Zhihao Cheng authored
The inode reclaiming process(See function prune_icache_sb) collects all reclaimable inodes and mark them with I_FREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function find_inode_fast), then the reclaiming process destroy the inodes by function dispose_list(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen. Case 1: In function ext4_evict_inode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this: 1. File A has inode i_reg and an ea inode i_ea 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea 3. Then, following three processes running like this: PA PB echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // i_reg is added into lru, lru->i_ea->i_reg prune_icache_sb list_lru_walk_one inode_lru_isolate i_ea->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(i_reg) spin_unlock(&i_reg->i_lock) spin_unlock(lru_lock) rm file A i_reg->nlink = 0 iput(i_reg) // i_reg->nlink is 0, do evict ext4_evict_inode ext4_xattr_delete_inode ext4_xattr_inode_dec_ref_all ext4_xattr_inode_iget ext4_iget(i_ea->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(i_ea) ----→ AA deadlock dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&i_ea->i_state) Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file deleting process holds BASEHD's wbuf->io_mutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following: 1. File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr. 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa 3. Then, following three processes running like this: PA PB PC echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // ib and ia are added into lru, lru->ixa->ib->ia prune_icache_sb list_lru_walk_one inode_lru_isolate ixa->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(ib) spin_unlock(&ib->i_lock) spin_unlock(lru_lock) rm file B ib->nlink = 0 rm file A iput(ia) ubifs_evict_inode(ia) ubifs_jnl_delete_inode(ia) ubifs_jnl_write_inode(ia) make_reservation(BASEHD) // Lock wbuf->io_mutex ubifs_iget(ixa->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifs_evict_inode | ubifs_jnl_delete_inode(ib) ↓ ubifs_jnl_write_inode ABBA deadlock ←-----make_reservation(BASEHD) dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&ixa->i_state) Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING to pin the inode in memory while inode_lru_isolate() reclaims its pages instead of using ordinary inode reference. This way inode deletion cannot be triggered from inode_lru_isolate() thus avoiding the deadlock. evict() is made to wait for I_LRU_ISOLATING to be cleared before proceeding with inode cleanup. Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022 Fixes: e50e5129 ("ext4: xattr-in-inode support") Fixes: 7959cf3a ("ubifs: journal: Handle xattrs like files") Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.comReviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Yonghong Song authored
A selftest is added such that without the previous patch, a crash can happen. With the previous patch, the test can run successfully. The new test is written in a way which mimics original crash case: main_prog static_prog_1 static_prog_2 where static_prog_1 has different paths to static_prog_2 and some path has stack allocated and some other path does not. A stacksafe() checking in static_prog_2() triggered the crash. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240812214852.214037-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Daniel Hodges reported a kernel verifier crash when playing with sched-ext. Further investigation shows that the crash is due to invalid memory access in stacksafe(). More specifically, it is the following code: if (exact != NOT_EXACT && old->stack[spi].slot_type[i % BPF_REG_SIZE] != cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false; The 'i' iterates old->allocated_stack. If cur->allocated_stack < old->allocated_stack the out-of-bound access will happen. To fix the issue add 'i >= cur->allocated_stack' check such that if the condition is true, stacksafe() should fail. Otherwise, cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal. Fixes: 2793a8b0 ("bpf: exact states comparison for iterator convergence checks") Cc: Eduard Zingerman <eddyz87@gmail.com> Reported-by: Daniel Hodges <hodgesd@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 12 Aug, 2024 13 commits
-
-
Leon Hwang authored
The commit f7866c35 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed a NULL pointer dereference panic, but didn't fix the issue that fails to update attached freplace prog to prog_array map. Since commit 1c123c56 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog and its target prog are able to tail call each other. And the commit 3aac1ead ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL after attaching freplace prog to its target prog. After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS. Then, after attaching freplace its prog->aux->dst_prog is NULL. Then, while updating freplace in prog_array the bpf_prog_map_compatible() incorrectly returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS and update to prog_array can succeed. Fixes: f7866c35 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
David Howells authored
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later. The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating. Fix this in the following way: (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request(). (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set. (3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set. (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so. (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state. Just reverting ae678317[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { and: WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386 may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion. Fixes: 2ff1e975 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Hristo Venev <hristo@venev.name> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.ukSigned-off-by: Christian Brauner <brauner@kernel.org>
-
David Howells authored
This reverts commit ae678317. Revert the patch that removes the deprecated use of PG_private_2 in netfslib for the moment as Ceph is actually still using this to track data copied to the cache. Fixes: ae678317 ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Mathias Krause authored
The explanatory comment above take_fd() contains a typo, fix that to not confuse readers. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240809135035.748109-1-minipli@grsecurity.netSigned-off-by: Christian Brauner <brauner@kernel.org>
-
Christian Brauner authored
It's currently possible to create pidfds for kthreads but it is unclear what that is supposed to mean. Until we have use-cases for it and we figured out what behavior we want block the creation of pidfds for kthreads. Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner Fixes: 32fcb426 ("pid: add pidfd_open()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Lukas Bulwahn authored
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First, NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas the actual intended config is called NETFS_SUPPORT. Second, the config renaming misses to adjust the documentation of the functionality of this config. Clean up those two points. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.comSigned-off-by: Christian Brauner <brauner@kernel.org>
-
yangerkun authored
After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below). 1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition) We choose the same logic what commit 9b378f6a ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e45955 ("shmem: stable directory offsets") Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.comReviewed-by: Chuck Lever <chuck.lever@oracle.com> [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Christian Brauner authored
The kernel is writing an object of type __u64, so the ioctl has to be defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5). Reported-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.orgSigned-off-by: Christian Brauner <brauner@kernel.org>
-
Max Kellermann authored
This fixes a NULL pointer dereference bug due to a data race which looks like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv. At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line: struct cachefiles_object *object = cachefiles_cres_object(cres); The next line crashes, obviously: struct cachefiles_cache *cache = object->volume->cache; During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero. The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE"). This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD. Fixes: 12bb21a2 ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Omar Sandoval authored
When struct file_lease was split out from struct file_lock, the name of the file_lock slab cache was copied to the new slab cache for file_lease. This name conflict causes confusion in /proc/slabinfo and /sys/kernel/slab. In particular, it caused failures in drgn's test case for slab cache merging. Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81 Fixes: c69ff407 ("filelock: split leases out of struct file_lock") Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.comReviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Matthew Wilcox (Oracle) authored
As in commit 4e527d58 ("iomap: fault in smaller chunks for non-large folio mappings"), we can see a performance loss for filesystems which have not yet been converted to large folios. Fixes: c38f4e96 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.orgReviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "While the ideapad concurrency fix itself is relatively straightforward, it required moving code around and adding a bit of supporting infrastructure to have a clean inter-driver interface. This shows up in the diffstats. - ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex - amd/pmf: Query HPD data also when ALS is disabled" * tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc platform/x86: ideapad-laptop: introduce a generic notification chain platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull fd bitmap fix from Al Viro: "Fix bitmap corruption on close_range() by cleaning up copy_fd_bitmaps()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
-
- 11 Aug, 2024 9 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: - Fix 32-bit PTI for real. pti_clone_entry_text() is called twice, once before initcalls so that initcalls can use the user-mode helper and then again after text is set read only. Setting read only on 32-bit might break up the PMD mapping, which makes the second invocation of pti_clone_entry_text() find the mappings out of sync and failing. Allow the second call to split the existing PMDs in the user mapping and synchronize with the kernel mapping. - Don't make acpi_mp_wake_mailbox read-only after init as the mail box must be writable in the case that CPU hotplug operations happen after boot. Otherwise the attempt to start a CPU crashes with a write to read only memory. - Add a missing sanity check in mtrr_save_state() to ensure that the fixed MTRR MSRs are supported. Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but the WARN_ON() can bring systems down when panic on warn is set. * tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed MTRRs exist before saving them x86/paravirt: Fix incorrect virt spinlock setting on bare metal x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox x86/mm: Fix PTI for i386 some more
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull time keeping fixes from Thomas Gleixner: - Fix a couple of issues in the NTP code where user supplied values are neither sanity checked nor clamped to the operating range. This results in integer overflows and eventualy NTP getting out of sync. According to the history the sanity checks had been removed in favor of clamping the values, but the clamping never worked correctly under all circumstances. The NTP people asked to not bring the sanity checks back as it might break existing applications. Make the clamping work correctly and add it where it's missing - If adjtimex() sets the clock it has to trigger the hrtimer subsystem so it can adjust and if the clock was set into the future expire timers if needed. The caller should provide a bitmask to tell hrtimers which clocks have been adjusted. adjtimex() uses not the proper constant and uses CLOCK_REALTIME instead, which is 0. So hrtimers adjusts only the clocks, but does not check for expired timers, which might make them expire really late. Use the proper bitmask constant instead. * tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex() ntp: Safeguard against time_constant overflow ntp: Clamp maxerror and esterror to operating range
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Thomas Gleixner: "Three small fixes for interrupt core and drivers: - The interrupt core fails to honor caller supplied affinity hints for non-managed interrupts and uses the system default affinity on startup instead. Set the missing flag in the descriptor to tell the core to use the provided affinity. - Fix a shift out of bounds error in the Xilinx driver - Handle switching to level trigger correctly in the RISCV APLIC driver. It failed to retrigger the interrupt which causes it to become stale" * tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration irqchip/xilinx: Fix shift out of bounds genirq/irqdesc: Honor caller provided affinity in alloc_desc()
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are a number of small USB driver fixes for reported issues for 6.11-rc3. Included in here are: - usb serial driver MODULE_DESCRIPTION() updates - usb serial driver fixes - typec driver fixes - usb-ip driver fix - gadget driver fixes - dt binding update All of these have been in linux-next with no reported issues" * tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common() usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt() usb: gadget: f_fs: restore ffs_func_disable() functionality USB: serial: debug: do not echo input by default usb: typec: tipd: Delete extra semi-colon usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch() usb: gadget: u_serial: Set start_delayed during suspend usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap() usb: typec: fsa4480: Check if the chip is really there usb: gadget: core: Check for unset descriptor usb: vhci-hcd: Do not drop references before new references are gained usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed. usb: gadget: midi2: Fix the response for FB info with block 0xff dt-bindings: usb: microchip,usb2514: Add USB2517 compatible USB: serial: garmin_gps: use struct_size() to allocate pkt USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by USB: serial: add missing MODULE_DESCRIPTION() macros USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty / serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for reported problems for 6.11-rc3. Included in here are: - sc16is7xx serial driver fixes - uartclk bugfix for a divide by zero issue - conmakehash userspace build issue fix All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: vt: conmakehash: cope with abs_srctree no longer in env serial: sc16is7xx: fix invalid FIFO access with special register set serial: sc16is7xx: fix TX fifo corruption serial: core: check uartclk for zero to avoid divide by zero
-
Linus Torvalds authored
Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / documentation fixes from Greg KH: "Here are some small fixes, and some documentation updates for 6.11-rc3. Included in here are: - embargoed hardware documenation updates based on a lot of review by legal-types in lots of companies to try to make the process a _bit_ easier for us to manage over time. - rust firmware documentation fix - driver detach race fix for the fix that went into 6.11-rc1 All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix uevent_show() vs driver detach race Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes rust: firmware: fix invalid rustdoc link
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc fixes from Greg KH: "Here are some small char/misc/other driver fixes for 6.11-rc3 for reported issues. Included in here are: - binder driver fixes - fsi MODULE_DESCRIPTION() additions (people seem to love them...) - eeprom driver fix - Kconfig dependency fix to resolve build issues - spmi driver fixes All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: spmi: pmic-arb: add missing newline in dev_err format strings spmi: pmic-arb: Pass the correct of_node to irq_domain_add_tree binder_alloc: Fix sleeping function called from invalid context binder: fix descriptor lookup for context manager char: add missing NetWinder MODULE_DESCRIPTION() macros misc: mrvl-cn10k-dpi: add PCI_IOV dependency eeprom: ee1004: Fix locking issues in ee1004_probe() fsi: add missing MODULE_DESCRIPTION() macros
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Two core fixes: one to prevent discard type changes (seen on iSCSI) during intermittent errors and the other is fixing a lockdep problem caused by the queue limits change. And one driver fix in ufs" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Keep the discard mode stable scsi: sd: Move sd_read_cpr() out of the q->limits_lock region scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
-
- 10 Aug, 2024 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds authored
Pull nfsd fixes from Chuck Lever: - Two minor fixes for recent changes * tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets sunrpc: avoid -Wformat-security warning
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: - Two fixes for SMBusAlert handling in the I2C core: one to avoid an endless loop when scanning for handlers and one to make sure handlers are always called even if HW has broken behaviour - I2C header build fix for when ACPI is enabled but I2C isn't - The testunit gets a rename in the code to match the documentation - Two fixes for the Qualcomm GENI I2C controller are cleaning up the error exit patch in the runtime_resume() function. The first is disabling the clock, the second disables the icc on the way out * tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: testunit: match HostNotify test name with docs i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume i2c: Fix conditional for substituting empty ACPI functions i2c: smbus: Send alert notifications to all devices if source not found i2c: smbus: Improve handling of stuck alerts
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - avoid a deadlock with dma-debug and netconsole (Rik van Riel) * tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: avoid deadlock between dma debug vs printk and netconsole
-
git://evilpiepirate.org/bcachefsLinus Torvalds authored
Pull more bcachefs fixes from Kent Overstreet: "A couple last minute fixes for the new disk accounting - fix a bug that was causing ACLs to seemingly "disappear" - new on disk format version, bcachefs_metadata_version_disk_accounting_v3 bcachefs_metadata_version_disk_accounting_v2 accidentally included padding in disk_accounting_key; fortunately, 6.11 isn't out yet so we can fix this with another version bump" * tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs: bcachefs: bcachefs_metadata_version_disk_accounting_v3 bcachefs: improve bch2_dev_usage_to_text() bcachefs: bch2_accounting_invalid() bcachefs: Switch to .get_inode_acl()
-
Yong-Xuan Wang authored
The section 4.5.2 of the RISC-V AIA specification says that "any write to a sourcecfg register of an APLIC might (or might not) cause the corresponding interrupt-pending bit to be set to one if the rectified input value is high (= 1) under the new source mode." When the interrupt type is changed in the sourcecfg register, the APLIC device might not set the corresponding pending bit, so the interrupt might never become pending. To handle sourcecfg register changes for level-triggered interrupts in MSI mode, manually set the pending bit for retriggering interrupt so it gets retriggered if it was already asserted. Fixes: ca8df97f ("irqchip/riscv-aplic: Add support for MSI-mode") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240809071049.2454-1-yongxuan.wang@sifive.com
-
Radhey Shyam Pandey authored
The device tree property 'xlnx,kind-of-intr' is sanity checked that the bitmask contains only set bits which are in the range of the number of interrupts supported by the controller. The check is done by shifting the mask right by the number of supported interrupts and checking the result for zero. The data type of the mask is u32 and the number of supported interrupts is up to 32. In case of 32 interrupts the shift is out of bounds, resulting in a mismatch warning. The out of bounds condition is also reported by UBSAN: UBSAN: shift-out-of-bounds in irq-xilinx-intc.c:332:22 shift exponent 32 is too large for 32-bit type 'unsigned int' Fix it by promoting the mask to u64 for the test. Fixes: d50466c9 ("microblaze: intc: Refactor DT sanity check") Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/1723186944-3571957-1-git-send-email-radhey.shyam.pandey@amd.com
-