1. 09 Mar, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: ensure RCU callback ordering with rcu_barrier() · 805b13ad
      Jens Axboe authored
      After more careful studying, Paul informs me that we cannot rely on
      ordering of RCU callbacks in the way that the the tagged commit did.
      The current construct looks like this:
      
      	void C(struct rcu_head *rhp)
      	{
      		do_something(rhp);
      		call_rcu(&p->rh, B);
      	}
      
      	call_rcu(&p->rh, A);
      	call_rcu(&p->rh, C);
      
      and we're relying on ordering between A and B, which isn't guaranteed.
      Make this explicit instead, and have a work item issue the rcu_barrier()
      to ensure that A has run before we manually execute B.
      
      While thorough testing never showed this issue, it's dependent on the
      per-cpu load in terms of RCU callbacks. The updated method simplifies
      the code as well, and eliminates the need to maintain an rcu_head in
      the fileset data.
      
      Fixes: c1e2148f ("io_uring: free fixed_file_data after RCU grace period")
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      805b13ad
  2. 07 Mar, 2020 1 commit
    • Pavel Begunkov's avatar
      io_uring: fix lockup with timeouts · f0e20b89
      Pavel Begunkov authored
      There is a recipe to deadlock the kernel: submit a timeout sqe with a
      linked_timeout (e.g.  test_single_link_timeout_ception() from liburing),
      and SIGKILL the process.
      
      Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
      isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
      during io_put_free() to cancel the linked timeout. Probably, the same
      can happen with another io_kill_timeout() call site, that is
      io_commit_cqring().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f0e20b89
  3. 06 Mar, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: free fixed_file_data after RCU grace period · c1e2148f
      Jens Axboe authored
      The percpu refcount protects this structure, and we can have an atomic
      switch in progress when exiting. This makes it unsafe to just free the
      struct normally, and can trigger the following KASAN warning:
      
      BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
      Read of size 1 at addr ffff888181a19a30 by task swapper/0/0
      
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack+0x76/0xa0
       print_address_description.constprop.0+0x3b/0x60
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       __kasan_report.cold+0x1a/0x3d
       ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
       rcu_core+0x370/0x830
       ? percpu_ref_exit+0x50/0x50
       ? rcu_note_context_switch+0x7b0/0x7b0
       ? run_rebalance_domains+0x11d/0x140
       __do_softirq+0x10a/0x3e9
       irq_exit+0xd5/0xe0
       smp_apic_timer_interrupt+0x86/0x200
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:default_idle+0x26/0x1f0
      
      Fix this by punting the final exit and free of the struct to RCU, then
      we know that it's safe to do so. Jann suggested the approach of using a
      double rcu callback to achieve this. It's important that we do a nested
      call_rcu() callback, as otherwise the free could be ordered before the
      atomic switch, even if the latter was already queued.
      
      Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
      Suggested-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c1e2148f
  4. 02 Mar, 2020 2 commits
  5. 01 Mar, 2020 5 commits
  6. 29 Feb, 2020 4 commits
    • Dan Carpenter's avatar
      ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() · 37b0b6b8
      Dan Carpenter authored
      If sbi->s_flex_groups_allocated is zero and the first allocation fails
      then this code will crash.  The problem is that "i--" will set "i" to
      -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
      is type promoted to unsigned and becomes UINT_MAX.  Since UINT_MAX
      is more than zero, the condition is true so we call kvfree(new_groups[-1]).
      The loop will carry on freeing invalid memory until it crashes.
      
      Fixes: 7c990728 ("ext4: fix potential race between s_flex_groups online resizing and access")
      Reviewed-by: default avatarSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountainSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      37b0b6b8
    • Wolfram Sang's avatar
      macintosh: therm_windtunnel: fix regression when instantiating devices · 38b17afb
      Wolfram Sang authored
      Removing attach_adapter from this driver caused a regression for at
      least some machines. Those machines had the sensors described in their
      DT, too, so they didn't need manual creation of the sensor devices. The
      old code worked, though, because manual creation came first. Creation of
      DT devices then failed later and caused error logs, but the sensors
      worked nonetheless because of the manually created devices.
      
      When removing attach_adaper, manual creation now comes later and loses
      the race. The sensor devices were already registered via DT, yet with
      another binding, so the driver could not be bound to it.
      
      This fix refactors the code to remove the race and only manually creates
      devices if there are no DT nodes present. Also, the DT binding is updated
      to match both, the DT and manually created devices. Because we don't
      know which device creation will be used at runtime, the code to start
      the kthread is moved to do_probe() which will be called by both methods.
      
      Fixes: 3e7bed52 ("macintosh: therm_windtunnel: drop using attach_adapter")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723Reported-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Tested-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org # v4.19+
      38b17afb
    • Qian Cai's avatar
      jbd2: fix data races at struct journal_head · 6c5d9112
      Qian Cai authored
      journal_head::b_transaction and journal_head::b_next_transaction could
      be accessed concurrently as noticed by KCSAN,
      
       LTP: starting fsync04
       /dev/zero: Can't open blockdev
       EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
       EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
       ==================================================================
       BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
      
       write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
        __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
        __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
        jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
        (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
        kjournald2+0x13b/0x450 [jbd2]
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
        jbd2_write_access_granted+0x1b2/0x250 [jbd2]
        jbd2_write_access_granted at fs/jbd2/transaction.c:1155
        jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
        __ext4_journal_get_write_access+0x50/0x90 [ext4]
        ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
        ext4_mb_new_blocks+0x54f/0xca0 [ext4]
        ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
        ext4_map_blocks+0x3b4/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block+0x3b/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       5 locks held by fsync04/25724:
        #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
        #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
        #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
        #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
       irq event stamp: 1407125
       hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The plain reads are outside of jh->b_state_lock critical section which result
      in data races. Fix them by adding pairs of READ|WRITE_ONCE().
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pwSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6c5d9112
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7557c1b3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Four small fixes.
      
        Three are in drivers for fairly obvious bugs. The fourth is a set of
        regressions introduced by the compat_ioctl changes because some of the
        compat updates wrongly replaced .ioctl instead of .compat_ioctl"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
        scsi: zfcp: fix wrong data and display format of SFP+ temperature
        scsi: sd_sbc: Fix sd_zbc_report_zones()
        scsi: libfc: free response frame from GPN_ID
      7557c1b3
  7. 28 Feb, 2020 21 commits
  8. 27 Feb, 2020 5 commits