Commits · ce3e4cfb59cb382f8e5ce359238aa580d4ae7778 · Kirill Smelkov / linux

24 Apr, 2019 10 commits

bcache: add failure check to run_cache_set() for journal replay · ce3e4cfb

Coly Li authored Apr 25, 2019

Currently run_cache_set() has no return value, if there is failure in
bch_journal_replay(), the caller of run_cache_set() has no idea about
such failure and just continue to execute following code after
run_cache_set().  The internal failure is triggered inside
bch_journal_replay() and being handled in async way. This behavior is
inefficient, while failure handling inside bch_journal_replay(), cache
register code is still running to start the cache set. Registering and
unregistering code running as same time may introduce some rare race
condition, and make the code to be more hard to be understood.

This patch adds return value to run_cache_set(), and returns -EIO if
bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
such failure and stop registering code flow immedidately inside
register_cache_set().

If journal replay fails, run_cache_set() can report error immediately
to register_cache_set(). This patch makes the failure handling for
bch_journal_replay() be in synchronized way, easier to understand and
debug, and avoid poetential race condition for register-and-unregister
in same time.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ce3e4cfb

bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() · 1bee2add

Coly Li authored Apr 25, 2019

In journal_reclaim() ja->cur_idx of each cache will be update to
reclaim available journal buckets. Variable 'int n' is used to count how
many cache is successfully reclaimed, then n is set to c->journal.key
by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache()
loop will write the jset data onto each cache.

The problem is, if all jouranl buckets on each cache is full, the
following code in journal_reclaim(),

529 for_each_cache(ca, c, iter) {
530       struct journal_device *ja = &ca->journal;
531       unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
532
533       /* No space available on this device */
534       if (next == ja->discard_idx)
535               continue;
536
537       ja->cur_idx = next;
538       k->ptr[n++] = MAKE_PTR(0,
539                         bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
540                         ca->sb.nr_this_dev);
541 }
542
543 bkey_init(k);
544 SET_KEY_PTRS(k, n);

If there is no available bucket to reclaim, the if() condition at line
534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS()
will set KEY_PTRS field of c->journal.key to 0.

Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in
journal_write_unlocked() the journal data is written in following loop,

649	for (i = 0; i < KEY_PTRS(k); i++) {
650-671		submit journal data to cache device
672	}

If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data
won't be written to cache device here. If system crahed or rebooted
before bkeys of the lost journal entries written into btree nodes, data
corruption will be reported during bcache reload after rebooting the
system.

Indeed there is only one cache in a cache set, there is no need to set
KEY_PTRS field in journal_reclaim() at all. But in order to keep the
for_each_cache() logic consistent for now, this patch fixes the above
problem by not setting 0 KEY_PTRS of journal key, if there is no bucket
available to reclaim.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1bee2add

bcache: move definition of 'int ret' out of macro read_bucket() · 14215ee0

Coly Li authored Apr 25, 2019

'int ret' is defined as a local variable inside macro read_bucket().
Since this macro is called multiple times, and following patches will
use a 'int ret' variable in bch_journal_read(), this patch moves
definition of 'int ret' from macro read_bucket() to range of function
bch_journal_read().
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

14215ee0

bcache: fix a race between cache register and cacheset unregister · a4b732a2

Liang Chen authored Apr 25, 2019

There is a race between cache device register and cache set unregister.
For an already registered cache device, register_bcache will call
bch_is_open to iterate through all cachesets and check every cache
there. The race occurs if cache_set_free executes at the same time and
clears the caches right before ca is dereferenced in bch_is_open_cache.
To close the race, let's make sure the clean up work is protected by
the bch_register_lock as well.

This issue can be reproduced as follows,
while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &

and results in the following oops,

[  +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998
[  +0.000457] #PF error: [normal kernel read fault]
[  +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0
[  +0.000388] Oops: 0000 [#1] SMP PTI
[  +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6
[  +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
[  +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache]
[  +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d
[  +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202
[  +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000
[  +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001
[  +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a
[  +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0
[  +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620
[  +0.000384] FS:  00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000
[  +0.000420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0
[  +0.000837] Call Trace:
[  +0.000682]  ? _cond_resched+0x10/0x20
[  +0.000691]  ? __kmalloc+0x131/0x1b0
[  +0.000710]  kernfs_fop_write+0xfa/0x170
[  +0.000733]  __vfs_write+0x2e/0x190
[  +0.000688]  ? inode_security+0x10/0x30
[  +0.000698]  ? selinux_file_permission+0xd2/0x120
[  +0.000752]  ? security_file_permission+0x2b/0x100
[  +0.000753]  vfs_write+0xa8/0x1a0
[  +0.000676]  ksys_write+0x4d/0xb0
[  +0.000699]  do_syscall_64+0x3a/0xf0
[  +0.000692]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a4b732a2

bcache: Clean up bch_get_congested() · 3a394727

George Spelvin authored Apr 25, 2019

There are a few nits in this function.  They could in theory all
be separate patches, but that's probably taking small commits
too far.

1) I added a brief comment saying what it does.

2) I like to declare pointer parameters "const" where possible
   for documentation reasons.

3) It uses bitmap_weight(&rand, BITS_PER_LONG) to compute the Hamming
weight of a 32-bit random number (giving a random integer with
mean 16 and variance 8).  Passing by reference in a 64-bit variable
is silly; just use hweight32().

4) Its helper function fract_exp_two is unnecessarily tangled.
Gcc can optimize the multiply by (1 << x) to a shift, but it can
be written in a much more straightforward way at the cost of one
more bit of internal precision.  Some analysis reveals that this
bit is always available.

This shrinks the object code for fract_exp_two(x, 6) from 23 bytes:

0000000000000000 <foo1>:
   0:   89 f9                   mov    %edi,%ecx
   2:   c1 e9 06                shr    $0x6,%ecx
   5:   b8 01 00 00 00          mov    $0x1,%eax
   a:   d3 e0                   shl    %cl,%eax
   c:   83 e7 3f                and    $0x3f,%edi
   f:   d3 e7                   shl    %cl,%edi
  11:   c1 ef 06                shr    $0x6,%edi
  14:   01 f8                   add    %edi,%eax
  16:   c3                      retq

To 19:

0000000000000017 <foo2>:
  17:   89 f8                   mov    %edi,%eax
  19:   83 e0 3f                and    $0x3f,%eax
  1c:   83 c0 40                add    $0x40,%eax
  1f:   89 f9                   mov    %edi,%ecx
  21:   c1 e9 06                shr    $0x6,%ecx
  24:   d3 e0                   shl    %cl,%eax
  26:   c1 e8 06                shr    $0x6,%eax
  29:   c3                      retq

(Verified with 0 <= frac_bits <= 8, 0 <= x < 16<<frac_bits;
both versions produce the same output.)

5) And finally, the call to bch_get_congested() in check_should_bypass()
is separated from the use of the value by multiple tests which
could moot the need to compute it.  Move the computation down to
where it's needed.  This also saves a local register to hold the
computed value.
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3a394727

bcache: use kmemdup_nul for CACHED_LABEL buffer · 792732d9

Geliang Tang authored Apr 25, 2019

This patch uses kmemdup_nul to create a NUL-terminated string from
dc->sb.label. This is better than open coding it.

With this, we can move env[2] initialization into env[] array to make
code more elegant.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

792732d9

bcache: avoid clang -Wunintialized warning · 78d4eb8a

Arnd Bergmann authored Apr 25, 2019

clang has identified a code path in which it thinks a
variable may be unused:

drivers/md/bcache/alloc.c:333:4: error: variable 'bucket' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
                        fifo_pop(&ca->free_inc, bucket);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
 #define fifo_pop(fifo, i)       fifo_pop_front(fifo, (i))
                                ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/md/bcache/util.h:189:6: note: expanded from macro 'fifo_pop_front'
        if (_r) {                                                       \
            ^~
drivers/md/bcache/alloc.c:343:46: note: uninitialized use occurs here
                        allocator_wait(ca, bch_allocator_push(ca, bucket));
                                                                  ^~~~~~
drivers/md/bcache/alloc.c:287:7: note: expanded from macro 'allocator_wait'
                if (cond)                                               \
                    ^~~~
drivers/md/bcache/alloc.c:333:4: note: remove the 'if' if its condition is always true
                        fifo_pop(&ca->free_inc, bucket);
                        ^
drivers/md/bcache/util.h:219:27: note: expanded from macro 'fifo_pop'
 #define fifo_pop(fifo, i)       fifo_pop_front(fifo, (i))
                                ^
drivers/md/bcache/util.h:189:2: note: expanded from macro 'fifo_pop_front'
        if (_r) {                                                       \
        ^
drivers/md/bcache/alloc.c:331:15: note: initialize the variable 'bucket' to silence this warning
                        long bucket;
                                   ^

This cannot happen in practice because we only enter the loop
if there is at least one element in the list.

Slightly rearranging the code makes this clearer to both the
reader and the compiler, which avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

78d4eb8a

bcache: fix inaccurate result of unused buckets · 4e0c04ec

Guoju Fang authored Apr 25, 2019

To get the amount of unused buckets in sysfs_priority_stats, the code
count the buckets which GC_SECTORS_USED is zero. It's correct and should
not be overwritten by the count of buckets which prio is zero.
Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e0c04ec

bcache: fix crashes stopping bcache device before read miss done · 1568ee7e

Guoju Fang authored Apr 25, 2019

The bio from upper layer is considered completed when bio_complete()
returns. In most scenarios bio_complete() is called in search_free(),
but when read miss happens, the bio_compete() is called when backing
device reading completed, while the struct search is still in use until
cache inserting finished.

If someone stops the bcache device just then, the device may be closed
and released, but after cache inserting finished the struct search will
access a freed struct cached_dev.

This patch add the reference of bcache device before bio_complete() when
read miss happens, and put it after the search is not used.
Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1568ee7e

block: don't run get_page() on pages from non-bvec iov iter · 0257c0ed

Ming Lei authored Apr 24, 2019

The refcount has been increased for pages retrieved from non-bvec iov iter
via __bio_iov_iter_get_pages(), so don't need to do that again.

Otherwise, IO pages are leaked easily.

Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Fixes: 7321ecbf ("block: change how we get page references in bio_iov_iter_get_pages")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0257c0ed

23 Apr, 2019 1 commit

block: clarify that bio_add_page() and related helpers can add multi pages · 551879a4

Ming Lei authored Apr 23, 2019

bio_add_page() and __bio_add_page() are capable of adding pages into
bio, and now we have at least two such usages alreay:

	- __bio_iov_bvec_add_pages()
	- nvmet_bdev_execute_rw().

So update comments on these two helpers.

The thing is a bit special for __bio_try_merge_page(), given the caller
needs to know if the new added page is same with the last added page,
then it isn't safe to pass multi-page in case that 'same_page' is true,
so adds warning on potential misuse, and updates comment on
__bio_try_merge_page().

Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

551879a4

22 Apr, 2019 6 commits

Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.2/block · 6c88d735

Jens Axboe authored Apr 22, 2019

Pull MD fixes from Song.

* 'md-next' of https://github.com/liu-song-6/linux:
  md/raid: raid5 preserve the writeback action after the parity check
  Revert "Don't jump to compute_result state from check_result state"
  md: return -ENODEV if rdev has no mddev assigned
  block: fix use-after-free on gendisk

6c88d735

block: don't show io_timeout if driver has no timeout handler · 4d25339e

Weiping Zhang authored Apr 02, 2019

If the low level driver has no timeout handler, the
/sys/block/<disk>/queue/io_timeout will not be displayed.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4d25339e

block: avoid scatterlist offsets > PAGE_SIZE · f9f76879

Christoph Hellwig authored Apr 19, 2019

While we generally allow scatterlists to have offsets larger than page
size for an entry, and other subsystems like the crypto code make use of
that, the block layer isn't quite ready for that.  Flip the switch back
to avoid them for now, and revisit that decision early in a merge window
once the known offenders are fixed.

Fixes: 8a96a0e4 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f9f76879

brd: re-enable __GFP_HIGHMEM in brd_insert_page() · f6b50160

Hou Tao authored Apr 22, 2019

__GFP_HIGHMEM is disabled if dax is enabled on brd, however
dax support for brd has been removed since commit (7a862fbb
"brd: remove dax support"), so restore __GFP_HIGHMEM in
brd_insert_page().

Also remove the no longer applicable comments about DAX and highmem.

Cc: stable@vger.kernel.org
Fixes: 7a862fbb ("brd: remove dax support")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f6b50160

block: fix use-after-free on gendisk · 6fcc44d1

Yufen Yu authored Apr 02, 2019

commit 2da78092 "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev->devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.

However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:

Process1		Worker			Process2
md_free
						blkdev_open
del_gendisk
  add delete_partition_work_fn() to wq
  						__blkdev_get
						get_gendisk
put_disk
  disk_release
    kfree(disk)
    						find part from ext_devt_idr
						get_disk_and_module(disk)
    					  	cause use after free

    			delete_partition_work_fn
			put_device(part)
    		  	part_release
		    	remove part from ext_devt_idr

Before <devt, hd_struct pointer> is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().

We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.

Thanks to Jan Kara for providing the solution and more clear comments
for the code.

Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6fcc44d1

Merge tag 'v5.1-rc6' into for-5.2/block · 5c61ee2c

Jens Axboe authored Apr 22, 2019

Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
comment, and is trivial. The other one is a conflict due to a later fix
in the bio multi-page work, and needs a bit more care.

* tag 'v5.1-rc6': (770 commits)
  Linux 5.1-rc6
  block: make sure that bvec length can't be overflow
  block: kill all_q_node in request_queue
  x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
  coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
  mm/kmemleak.c: fix unused-function warning
  init: initialize jump labels before command line option parsing
  kernel/watchdog_hld.c: hard lockup message should end with a newline
  kcov: improve CONFIG_ARCH_HAS_KCOV help text
  mm: fix inactive list balancing between NUMA nodes and cgroups
  mm/hotplug: treat CMA pages as unmovable
  proc: fixup proc-pid-vm test
  proc: fix map_files test on F29
  mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
  mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
  mm: swapoff: shmem_unuse() stop eviction without igrab()
  mm: swapoff: take notice of completion sooner
  mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
  mm: swapoff: shmem_find_swap_entries() filter out other types
  slab: store tagged freelist for off-slab slabmgmt
  ...
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5c61ee2c

21 Apr, 2019 1 commit
- Linux 5.1-rc6 · 085b7755
  Linus Torvalds authored Apr 21, 2019
  
  085b7755
20 Apr, 2019 11 commits

Merge tag 'nfs-for-5.1-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 9e5de623

Linus Torvalds authored Apr 20, 2019

Pull NFS client bugfix from Trond Myklebust:
 "Fix a regression in which an RPC call can be tagged with an error
  despite the transmission being successful"

* tag 'nfs-for-5.1-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Ignore queue transmission errors on successful transmission

9e5de623

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a06bc2f2

Linus Torvalds authored Apr 20, 2019

Pull SCSI fixes from James Bottomley:
 "Three minor fixes: two obvious ones in drivers and a fix to the SG_IO
  path to correctly return status on error"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: aic7xxx: fix EISA support
  Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
  scsi: core: set result when the command cannot be dispatched

a06bc2f2

Merge tag 'for-linus-20190420' of git://git.kernel.dk/linux-block · 38a2ca2c

Linus Torvalds authored Apr 20, 2019

Pull block fixes from Jens Axboe:
 "A set of small fixes that should go into this series. This contains:

   - Removal of unused queue member (Hou)

   - Overflow bvec fix (Ming)

   - Various little io_uring tweaks (me)
       - kthread parking
       - Only call cpu_possible() for verified CPU
       - Drop unused 'file' argument to io_file_put()
       - io_uring_enter vs io_uring_register deadlock fix
       - CQ overflow fix

   - BFQ internal depth update fix (me)"

* tag 'for-linus-20190420' of git://git.kernel.dk/linux-block:
  block: make sure that bvec length can't be overflow
  block: kill all_q_node in request_queue
  io_uring: fix CQ overflow condition
  io_uring: fix possible deadlock between io_uring_{enter,register}
  io_uring: drop io_file_put() 'file' argument
  bfq: update internal depth state when queue depth changes
  io_uring: only test SQPOLL cpu after we've verified it
  io_uring: park SQPOLL thread if it's percpu

38a2ca2c

Merge tag 'i3c/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · 34396bdf

Linus Torvalds authored Apr 20, 2019

Pill i3c fixes from Boris Brezillon:

 - fix the random PID check

 - fix the disable controller logic in the designware driver

 - fix I3C entry in MAINTAINERS

* tag 'i3c/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  MAINTAINERS: Fix the I3C entry
  i3c: dw: Fix dw_i3c_master_disable controller by using correct mask
  i3c: Fix the verification of random PID

34396bdf

Merge tag 'sound-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 4b609f1e

Linus Torvalds authored Apr 20, 2019

Pull sound fixes from Takashi Iwai:
 "Two core fixes for long-standing bugs for the races at concurrent
  device creation and deletion that were (unsurprisingly) spotted by
  syzkaller with usb-fuzzer.

  The rest are usual small HD-audio fixes"

* tag 'sound-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - add two more pin configuration sets to quirk table
  ALSA: core: Fix card races between register and disconnect
  ALSA: info: Fix racy addition/deletion of nodes
  ALSA: hda: Initialize power_state field properly

4b609f1e

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e899cc3b

Linus Torvalds authored Apr 20, 2019

Pull timer fixes from Ingo Molnar:
 "Misc clocksource driver fixes, and a sched-clock wrapping fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
  clocksource/drivers/timer-ti-dm: Remove omap_dm_timer_set_load_start
  clocksource/drivers/oxnas: Fix OX820 compatible
  clocksource/drivers/arm_arch_timer: Remove unneeded pr_fmt macro
  clocksource/drivers/npcm: select TIMER_OF

e899cc3b

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b25c69b9

Linus Torvalds authored Apr 20, 2019

Pull perf fixes from Ingo Molnar:
 "Misc fixes:
   - various tooling fixes
   - kretprobe fixes
   - kprobes annotation fixes
   - kprobes error checking fix
   - fix the default events for AMD Family 17h CPUs
   - PEBS fix
   - AUX record fix
   - address filtering fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Avoid kretprobe recursion bug
  kprobes: Mark ftrace mcount handler functions nokprobe
  x86/kprobes: Verify stack frame on kretprobe
  perf/x86/amd: Add event map for AMD Family 17h
  perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()
  perf tools: Fix map reference counting
  perf evlist: Fix side band thread draining
  perf tools: Check maps for bpf programs
  perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info()
  tools include uapi: Sync sound/asound.h copy
  perf top: Always sample time to satisfy needs of use of ordered queuing
  perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user)
  tools lib traceevent: Fix missing equality check for strcmp
  perf stat: Disable DIR_FORMAT feature for 'perf stat record'
  perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view
  perf header: Fix lock/unlock imbalances when processing BPF/BTF info
  perf/x86: Fix incorrect PEBS_REGS
  perf/ring_buffer: Fix AUX record suppression
  perf/core: Fix the address filtering fix
  kprobes: Fix error check when reusing optimized probes

b25c69b9

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1fd91d71

Linus Torvalds authored Apr 20, 2019

Pull x86 fixes from Ingo Molnar:
 "Misc fixes all over the place: a console spam fix, section attributes
  fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot
  quirk, boot options related warnings fix, an LTO fix, a deadlock fix
  and an RDT fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
  x86/cpu/bugs: Use __initconst for 'const' init data
  x86/mm/KASLR: Fix the size of the direct mapping section
  x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness"
  x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info"
  x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T
  x86/mm: Prevent bogus warnings with "noexec=off"
  x86/build/lto: Fix truncated .bss with -fdata-sections
  x86/speculation: Prevent deadlock on ssb_state::lock
  x86/resctrl: Do not repeat rdtgroup mode initialization

1fd91d71

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b4cf585

Linus Torvalds authored Apr 20, 2019

Pull scheduler fixes from Ingo Molnar:
 "A deadline scheduler warning/race fix, and a cfs_period_us quota
  calculation workaround where the real fix looks too involved to merge
  immediately"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Correctly handle active 0-lag timers
  sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup

2b4cf585

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · de3af9a9

Linus Torvalds authored Apr 20, 2019

Pull locking fixes from Ingo Molnar:
 "A lockdep warning fix and a script execution fix when atomics are
  generated"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomics: Don't assume that scripts are executable
  locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again

de3af9a9

Merge branch 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 371dd432

Linus Torvalds authored Apr 19, 2019

Pull cgroup fix from Tejun Heo:
 "A patch to fix a RCU imbalance error in the devices cgroup
  configuration error path"

* 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  device_cgroup: fix RCU imbalance in error case

371dd432

19 Apr, 2019 11 commits

Merge branch 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · 4c3f49ae

Linus Torvalds authored Apr 19, 2019

Pull percpu fixlet from Dennis Zhou:
 "This stops printing the base address of percpu memory on
  initialization"

* 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: stop printing kernel addresses

4c3f49ae

Merge tag 'tty-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 55e3a6ba

Linus Torvalds authored Apr 19, 2019

Pull tty/serial fixes from Greg KH:
 "Here are five small fixes for some tty/serial/vt issues that have been
  reported.

  The vt one has been around for a while, it is good to finally get that
  resolved. The others fix a build warning that showed up in 5.1-rc1,
  and resolve a problem in the sh-sci driver.

  Note, the second patch for build warning fix for the sc16is7xx driver
  was just applied to the tree, as it resolves a problem with the
  previous patch to try to solve the issue. It has not shown up in
  linux-next yet, unlike all of the other patches, but it has passed
  0-day testing and everyone seems to agree that it is correct"

* tag 'tty-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  sc16is7xx: put err_spi and err_i2c into correct #ifdef
  vt: fix cursor when clearing the screen
  sc16is7xx: move label 'err_spi' to correct section
  serial: sh-sci: Fix HSCIF RX sampling point adjustment
  serial: sh-sci: Fix HSCIF RX sampling point calculation

55e3a6ba

Merge branch 'akpm' (patches from Andrew) · 3ecafda9

Linus Torvalds authored Apr 19, 2019

Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
  mm/kmemleak.c: fix unused-function warning
  init: initialize jump labels before command line option parsing
  kernel/watchdog_hld.c: hard lockup message should end with a newline
  kcov: improve CONFIG_ARCH_HAS_KCOV help text
  mm: fix inactive list balancing between NUMA nodes and cgroups
  mm/hotplug: treat CMA pages as unmovable
  proc: fixup proc-pid-vm test
  proc: fix map_files test on F29
  mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
  mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
  mm: swapoff: shmem_unuse() stop eviction without igrab()
  mm: swapoff: take notice of completion sooner
  mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
  mm: swapoff: shmem_find_swap_entries() filter out other types
  slab: store tagged freelist for off-slab slabmgmt

3ecafda9

Merge tag 'staging-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · b222e9af

Linus Torvalds authored Apr 19, 2019

Pull staging and IIO fixes from Greg KH:
 "Here is a bunch of IIO driver fixes, and some smaller staging driver
  fixes, for 5.1-rc6. The IIO fixes were delayed due to my vacation, but
  all resolve a number of reported issues and have been in linux-next
  for a few weeks with no reported issues.

  The other staging driver fixes are all tiny, resolving some reported
  issues in the comedi and most drivers, as well as some erofs fixes.

  All of these patches have been in linux-next with no reported issues"

* tag 'staging-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (24 commits)
  staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
  staging: comedi: ni_usb6501: Fix use of uninitialized mutex
  staging: erofs: fix unexpected out-of-bound data access
  staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
  staging: comedi: vmk80xx: Fix use of uninitialized semaphore
  staging: most: core: use device description as name
  iio: core: fix a possible circular locking dependency
  iio: ad_sigma_delta: select channel when reading register
  iio: pms7003: select IIO_TRIGGERED_BUFFER
  iio: cros_ec: Fix the maths for gyro scale calculation
  iio: adc: xilinx: prevent touching unclocked h/w on remove
  iio: adc: xilinx: fix potential use-after-free on probe
  iio: adc: xilinx: fix potential use-after-free on remove
  iio: dac: mcp4725: add missing powerdown bits in store eeprom
  io: accel: kxcjk1013: restore the range after resume.
  iio:chemical:bme680: Fix SPI read interface
  iio:chemical:bme680: Fix, report temperature in millidegrees
  iio: chemical: fix missing Kconfig block for sgp30
  iio: adc: at91: disable adc channel interrupt in timeout case
  iio: gyro: mpu3050: fix chip ID reading
  ...

b222e9af

Merge tag 'char-misc-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f9764dd4

Linus Torvalds authored Apr 19, 2019

Pull char/misc fixes from Greg KH:
 "Here are four small misc driver fixes for 5.1-rc6.

  Nothing major at all, they fix up a Kconfig issues, a SPDX invalid
  license tag, and two tiny bugfixes.

  All have been in linux-next for a while with no reported issues"

* tag 'char-misc-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  drivers: power: supply: goldfish_battery: Fix bogus SPDX identifier
  extcon: ptn5150: fix COMPILE_TEST dependencies
  misc: fastrpc: add checked value for dma_set_mask
  habanalabs: remove low credit limit of DMA #0

f9764dd4

block: make sure that bvec length can't be overflow · 6bedf00e

Ming Lei authored Apr 17, 2019

bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as,
when one bio is splitted in the middle of one bvec via bio_split(),
and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of
remained bio. And the remained bio's bvec may be re-submitted to fs
layer via ITER_IBVEC, such as loop and nvme-loop.

So we have to make sure that every bvec's offset is less than
PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop,
nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC.

This patch fixes this issue reported by Zhang Yi When running nvme/011.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 6dc4f100 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6bedf00e

block: kill all_q_node in request_queue · b40fabc0

Hou Tao authored Apr 19, 2019

all_q_node has not been used since commit 4b855ad3 ("blk-mq: Create
hctx for each present CPU"), so remove it.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b40fabc0

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 240206fc

Linus Torvalds authored Apr 19, 2019

Pull input updates from Dmitry Torokhov:

 - several new key mappings for HID

 - a host of new ACPI IDs used to identify Elan touchpads in Lenovo
   laptops

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
  HID: input: add mapping for "Toggle Display" key
  HID: input: add mapping for "Full Screen" key
  HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
  HID: input: add mapping for Expose/Overview key
  HID: input: fix mapping of aspect ratio key
  [media] doc-rst: switch to new names for Full Screen/Aspect keys
  Input: document meanings of KEY_SCREEN and KEY_ZOOM
  Input: elan_i2c - add hardware ID for multiple Lenovo laptops

240206fc

x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority · 2ee27796

Hans de Goede authored Dec 30, 2018

The "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'" message triggers
on pretty much every Intel machine. The purpose of log messages with
a warning level is to notify the user of something which potentially is
a problem, or at least somewhat unexpected.

This message clearly does not match those criteria, so lower its log
priority from warning to info.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181230172715.17469-1-hdegoede@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

2ee27796

Merge tag 'perf-urgent-for-mingo-5.1-20190419' of... · 7579dfc4

Ingo Molnar authored Apr 19, 2019

Merge tag 'perf-urgent-for-mingo-5.1-20190419' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

perf top:

  Jiri Olsa:

  - Fix 'perf top --pid', it needs PERF_SAMPLE_TIME since we switched to using
    a different thread to sort the events and then even for just a single
    thread we now need timestamps.

BPF:

  Jiri Olsa:

  - Fix bpf_prog and btf lookup functions failure path to to properly return
    NULL.

  - Fix side band thread draining, used to process PERF_RECORD_BPF_EVENT
    metadata records.

core:

  Jiri Olsa:

  - Fix map lookup by name to get a refcount when the name is already in
    the tree. Found

  Song Liu:

  - Fix __map__is_kmodule() by taking into account recently added BPF
    maps.

UAPI:

  Arnaldo Carvalho de Melo:

  - Sync sound/asound.h copy
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7579dfc4

coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping · 04f5866e

Andrea Arcangeli authored Apr 18, 2019

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

04f5866e