1. 03 Jan, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 360f0342
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix a NULL kernel dereference in set_gid() on tracefs mounting.
      
         When tracefs is mounted with "gid=1000", it will update the existing
         dentries to have the new gid. The tracefs_inode which is retrieved by
         a container_of(dentry->d_inode) has flags to see if the inode belongs
         to the eventfs system.
      
         The issue that was fixed was if getdents() was called on tracefs that
         was previously mounted, and was not closed. It will leave a "cursor
         dentry" in the subdirs list of the current dentries that set_gid()
         walks. On a remount of tracefs, the container_of(dentry->d_inode)
         will dereference a NULL pointer and cause a crash when referenced.
      
         Simply have a check for dentry->d_inode to see if it is NULL and if
         so, skip that entry.
      
       - Fix the bits of the eventfs_inode structure.
      
         The "is_events" bit was taken from the nr_entries field, but the
         nr_entries field wasn't updated to be 30 bits and was still 31.
         Including the "is_freed" bit this would use 33 bits which would make
         the structure use another integer for just one bit.
      
      * tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Fix bitwise fields for "is_events"
        tracefs: Check for dentry->d_inode exists in set_gid()
      360f0342
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs · 981d0413
      Linus Torvalds authored
      Pull bcachefs from Kent Overstreet:
       "More bcachefs bugfixes for 6.7, and forwards compatibility work:
      
         - fix for a nasty extents + snapshot interaction, reported when
           reflink of a snapshotted file wouldn't complete but turned out to
           be a more general bug
      
         - fix for an invalid free in dio write path when iov vector was
           longer than our inline vector
      
         - fix for a buffer overflow in the nocow write path -
           BCH_REPLICAS_MAX doesn't actually limit the number of pointers in
           an extent when cached pointers are included
      
         - RO snapshots are actually RO now
      
         - And, a new superblock section to avoid future breakage when the
           disk space acounting rewrite rolls out: the new superblock section
           describes versions that need work to downgrade, where the work
           required is a list of recovery passes and errors to silently fix"
      
      * tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: make RO snapshots actually RO
        bcachefs: bch_sb_field_downgrade
        bcachefs: bch_sb.recovery_passes_required
        bcachefs: Add persistent identifiers for recovery passes
        bcachefs: prt_bitflags_vector()
        bcachefs: move BCH_SB_ERRS() to sb-errors_types.h
        bcachefs: fix buffer overflow in nocow write path
        bcachefs: DARRAY_PREALLOCATED()
        bcachefs: Switch darray to kvmalloc()
        bcachefs: Factor out darray resize slowpath
        bcachefs: fix setting version_upgrade_complete
        bcachefs: fix invalid free in dio write path
        bcachefs: Fix extents iteration + snapshots interaction
      981d0413
  2. 02 Jan, 2024 2 commits
    • Steven Rostedt (Google)'s avatar
      eventfs: Fix bitwise fields for "is_events" · fd56cd5f
      Steven Rostedt (Google) authored
      A flag was needed to denote which eventfs_inode was the "events"
      directory, so a bit was taken from the "nr_entries" field, as there's not
      that many entries, and 2^30 is plenty. But the bit number for nr_entries
      was not updated to reflect the bit taken from it, which would add an
      unnecessary integer to the structure.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 7e8358ed ("eventfs: Fix file and directory uid and gid ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fd56cd5f
    • Steven Rostedt (Google)'s avatar
      tracefs: Check for dentry->d_inode exists in set_gid() · ad579864
      Steven Rostedt (Google) authored
      If a getdents() is called on the tracefs directory but does not get all
      the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs
      dentry. This cursor dentry does not have a d_inode for it. Before
      referencing tracefs_inode from the dentry, the d_inode must first be
      checked if it has content. If not, then it's not a tracefs_inode and can
      be ignored.
      
      The following caused a crash:
      
       #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count)
       #define BUF_SIZE 256
       #define TDIR "/tmp/file0"
      
       int main(void)
       {
      	char buf[BUF_SIZE];
      	int fd;
             	int n;
      
             	mkdir(TDIR, 0777);
      	mount(NULL, TDIR, "tracefs", 0, NULL);
             	fd = openat(AT_FDCWD, TDIR, O_RDONLY);
             	n = getdents64(fd, buf, BUF_SIZE);
             	ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME,
      		    "gid=1000");
      	return 0;
       }
      
      That's because the 256 BUF_SIZE was not big enough to read all the
      dentries of the tracefs file system and it left a "cursor" dentry in the
      subdirs of the tracefs root inode. Then on remounting with "gid=1000",
      it would cause an iteration of all dentries which hit:
      
      	ti = get_tracefs(dentry->d_inode);
      	if (ti && (ti->flags & TRACEFS_EVENT_INODE))
      		eventfs_update_gid(dentry, gid);
      
      Which crashed because of the dereference of the cursor dentry which had a NULL
      d_inode.
      
      In the subdir loop of the dentry lookup of set_gid(), if a child has a
      NULL d_inode, simply skip it.
      
      Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 7e8358ed ("eventfs: Fix file and directory uid and gid ownership")
      Reported-by: default avatar"Ubisectech Sirius" <bugreport@ubisectech.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ad579864
  3. 01 Jan, 2024 13 commits
  4. 31 Dec, 2023 3 commits
  5. 30 Dec, 2023 5 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 453f5db0
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix readers that are blocked on the ring buffer when buffer_percent
         is 100%. They are supposed to wake up when the buffer is full, but
         because the sub-buffer that the writer is on is never considered
         "dirty" in the calculation, dirty pages will never equal nr_pages.
         Add +1 to the dirty count in order to count for the sub-buffer that
         the writer is on.
      
       - When a reader is blocked on the "snapshot_raw" file, it is to be
         woken up when a snapshot is done and be able to read the snapshot
         buffer. But because the snapshot swaps the buffers (the main one with
         the snapshot one), and the snapshot reader is waiting on the old
         snapshot buffer, it was not woken up (because it is now on the main
         buffer after the swap). Worse yet, when it reads the buffer after a
         snapshot, it's not reading the snapshot buffer, it's reading the live
         active main buffer.
      
         Fix this by forcing a wakeup of all readers on the snapshot buffer
         when a new snapshot happens, and then update the buffer that the
         reader is reading to be back on the snapshot buffer.
      
       - Fix the modification of the direct_function hash. There was a race
         when new functions were added to the direct_function hash as when it
         moved function entries from the old hash to the new one, a direct
         function trace could be hit and not see its entry.
      
         This is fixed by allocating the new hash, copy all the old entries
         onto it as well as the new entries, and then use rcu_assign_pointer()
         to update the new direct_function hash with it.
      
         This also fixes a memory leak in that code.
      
       - Fix eventfs ownership
      
      * tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ftrace: Fix modification of direct_function hash while in use
        tracing: Fix blocked reader of snapshot buffer
        ring-buffer: Fix wake ups when buffer_percent is set to 100
        eventfs: Fix file and directory uid and gid ownership
      453f5db0
    • David Laight's avatar
      locking/osq_lock: Clarify osq_wait_next() · b106bcf0
      David Laight authored
      Directly return NULL or 'next' instead of breaking out of the loop.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      [ Split original patch into two independent parts  - Linus ]
      Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b106bcf0
    • David Laight's avatar
      locking/osq_lock: Clarify osq_wait_next() calling convention · 563adbfc
      David Laight authored
      osq_wait_next() is passed 'prev' from osq_lock() and NULL from
      osq_unlock() but only needs the 'cpu' value to write to lock->tail.
      
      Just pass prev->cpu or OSQ_UNLOCKED_VAL instead.
      
      Should have no effect on the generated code since gcc manages to assume
      that 'prev != NULL' due to an earlier dereference.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      [ Changed 'old' to 'old_cpu' by request from Waiman Long  - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      563adbfc
    • David Laight's avatar
      locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c · 7c223098
      David Laight authored
      struct optimistic_spin_node is private to the implementation.
      Move it into the C file to ensure nothing is accessing it.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c223098
    • Steven Rostedt (Google)'s avatar
      ftrace: Fix modification of direct_function hash while in use · d05cb470
      Steven Rostedt (Google) authored
      Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
      if the number of new entries are added is large enough to cause two
      allocations in the loop:
      
              for (i = 0; i < size; i++) {
                      hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
                              new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
                              if (!new)
                                      goto out_remove;
                              entry->direct = addr;
                      }
              }
      
      Where ftrace_add_rec_direct() has:
      
              if (ftrace_hash_empty(direct_functions) ||
                  direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
                      struct ftrace_hash *new_hash;
                      int size = ftrace_hash_empty(direct_functions) ? 0 :
                              direct_functions->count + 1;
      
                      if (size < 32)
                              size = 32;
      
                      new_hash = dup_hash(direct_functions, size);
                      if (!new_hash)
                              return NULL;
      
                      *free_hash = direct_functions;
                      direct_functions = new_hash;
              }
      
      The "*free_hash = direct_functions;" can happen twice, losing the previous
      allocation of direct_functions.
      
      But this also exposed a more serious bug.
      
      The modification of direct_functions above is not safe. As
      direct_functions can be referenced at any time to find what direct caller
      it should call, the time between:
      
                      new_hash = dup_hash(direct_functions, size);
       and
                      direct_functions = new_hash;
      
      can have a race with another CPU (or even this one if it gets interrupted),
      and the entries being moved to the new hash are not referenced.
      
      That's because the "dup_hash()" is really misnamed and is really a
      "move_hash()". It moves the entries from the old hash to the new one.
      
      Now even if that was changed, this code is not proper as direct_functions
      should not be updated until the end. That is the best way to handle
      function reference changes, and is the way other parts of ftrace handles
      this.
      
      The following is done:
      
       1. Change add_hash_entry() to return the entry it created and inserted
          into the hash, and not just return success or not.
      
       2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
          the former.
      
       3. Allocate a "new_hash" at the start that is made for holding both the
          new hash entries as well as the existing entries in direct_functions.
      
       4. Copy (not move) the direct_function entries over to the new_hash.
      
       5. Copy the entries of the added hash to the new_hash.
      
       6. If everything succeeds, then use rcu_pointer_assign() to update the
          direct_functions with the new_hash.
      
      This simplifies the code and fixes both the memory leak as well as the
      race condition mentioned above.
      
      Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
      Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Fixes: 763e34e7 ("ftrace: Add register_ftrace_direct()")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d05cb470
  6. 29 Dec, 2023 10 commits
  7. 28 Dec, 2023 5 commits