1. 12 Feb, 2019 40 commits
    • Brian Foster's avatar
      xfs: fix shared extent data corruption due to missing cow reservation · c3a66bf4
      Brian Foster authored
      commit 59e42931 upstream.
      
      Page writeback indirectly handles shared extents via the existence
      of overlapping COW fork blocks. If COW fork blocks exist, writeback
      always performs the associated copy-on-write regardless if the
      underlying blocks are actually shared. If the blocks are shared,
      then overlapping COW fork blocks must always exist.
      
      fstests shared/010 reproduces a case where a buffered write occurs
      over a shared block without performing the requisite COW fork
      reservation.  This ultimately causes writeback to the shared extent
      and data corruption that is detected across md5 checks of the
      filesystem across a mount cycle.
      
      The problem occurs when a buffered write lands over a shared extent
      that crosses an extent size hint boundary and that also happens to
      have a partial COW reservation that doesn't cover the start and end
      blocks of the data fork extent.
      
      For example, a buffered write occurs across the file offset (in FSB
      units) range of [29, 57]. A shared extent exists at blocks [29, 35]
      and COW reservation already exists at blocks [32, 34]. After
      accommodating a COW extent size hint of 32 blocks and the existing
      reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
      blocks of reservation at offset 0 and returns with COW reservation
      across the range of [0, 34]. The associated data fork extent is
      still [29, 35], however, which isn't fully covered by the COW
      reservation.
      
      This leads to a buffered write at file offset 35 over a shared
      extent without associated COW reservation. Writeback eventually
      kicks in, performs an overwrite of the underlying shared block and
      causes the associated data corruption.
      
      Update xfs_reflink_reserve_cow() to accommodate the fact that a
      delalloc allocation request may not fully cover the extent in the
      data fork. Trim the data fork extent appropriately, just as is done
      for shared extent boundaries and/or existing COW reservations that
      happen to overlap the start of the data fork extent. This prevents
      shared/010 failures due to data corruption on reflink enabled
      filesystems.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c3a66bf4
    • Dave Chinner's avatar
      xfs: fix overflow in xfs_attr3_leaf_verify · a96f3a55
      Dave Chinner authored
      commit 837514f7 upstream.
      
      generic/070 on 64k block size filesystems is failing with a verifier
      corruption on writeback or an attribute leaf block:
      
      [   94.973083] XFS (pmem0): Metadata corruption detected at xfs_attr3_leaf_verify+0x246/0x260, xfs_attr3_leaf block 0x811480
      [   94.975623] XFS (pmem0): Unmount and run xfs_repair
      [   94.976720] XFS (pmem0): First 128 bytes of corrupted metadata buffer:
      [   94.978270] 000000004b2e7b45: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
      [   94.980268] 000000006b1db90b: 00 00 00 00 00 81 14 80 00 00 00 00 00 00 00 00  ................
      [   94.982251] 00000000433f2407: 22 7b 5c 82 2d 5c 47 4c bb 31 1c 37 fa a9 ce d6  "{\.-\GL.1.7....
      [   94.984157] 0000000010dc7dfb: 00 00 00 00 00 81 04 8a 00 0a 18 e8 dd 94 01 00  ................
      [   94.986215] 00000000d5a19229: 00 a0 dc f4 fe 98 01 68 f0 d8 07 e0 00 00 00 00  .......h........
      [   94.988171] 00000000521df36c: 0c 2d 32 e2 fe 20 01 00 0c 2d 58 65 fe 0c 01 00  .-2.. ...-Xe....
      [   94.990162] 000000008477ae06: 0c 2d 5b 66 fe 8c 01 00 0c 2d 71 35 fe 7c 01 00  .-[f.....-q5.|..
      [   94.992139] 00000000a4a6bca6: 0c 2d 72 37 fc d4 01 00 0c 2d d8 b8 f0 90 01 00  .-r7.....-......
      [   94.994789] XFS (pmem0): xfs_do_force_shutdown(0x8) called from line 1453 of file fs/xfs/xfs_buf.c. Return address = ffffffff815365f3
      
      This is failing this check:
      
                      end = ichdr.freemap[i].base + ichdr.freemap[i].size;
                      if (end < ichdr.freemap[i].base)
      >>>>>                   return __this_address;
                      if (end > mp->m_attr_geo->blksize)
                              return __this_address;
      
      And from the buffer output above, the freemap array is:
      
      	freemap[0].base = 0x00a0
      	freemap[0].size = 0xdcf4	end = 0xdd94
      	freemap[1].base = 0xfe98
      	freemap[1].size = 0x0168	end = 0x10000
      	freemap[2].base = 0xf0d8
      	freemap[2].size = 0x07e0	end = 0xf8b8
      
      These all look valid - the block size is 0x10000 and so from the
      last check in the above verifier fragment we know that the end
      of freemap[1] is valid. The problem is that end is declared as:
      
      	uint16_t	end;
      
      And (uint16_t)0x10000 = 0. So we have a verifier bug here, not a
      corruption. Fix the verifier to use uint32_t types for the check and
      hence avoid the overflow.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=201577Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a96f3a55
    • Christophe JAILLET's avatar
      xfs: Fix error code in 'xfs_ioc_getbmap()' · b6095cbd
      Christophe JAILLET authored
      commit 132bf672 upstream.
      
      In this function, once 'buf' has been allocated, we unconditionally
      return 0.
      However, 'error' is set to some error codes in several error handling
      paths.
      Before commit 232b5194 ("xfs: simplify the xfs_getbmap interface")
      this was not an issue because all error paths were returning directly,
      but now that some cleanup at the end may be needed, we must propagate the
      error code.
      
      Fixes: 232b5194 ("xfs: simplify the xfs_getbmap interface")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6095cbd
    • Christoph Hellwig's avatar
      xfs: cancel COW blocks before swapext · a585ac0e
      Christoph Hellwig authored
      commit 96987eea upstream.
      
      We need to make sure we have no outstanding COW blocks before we swap
      extents, as there is nothing preventing us from having preallocated COW
      delalloc on either inode that swapext is called on.  That case can
      easily be reproduced by running generic/324 in always_cow mode:
      
      [  620.760572] XFS: Assertion failed: tip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap_util.c, line: 1669
      [  620.761608] ------------[ cut here ]------------
      [  620.762171] kernel BUG at fs/xfs/xfs_message.c:102!
      [  620.762732] invalid opcode: 0000 [#1] SMP PTI
      [  620.763272] CPU: 0 PID: 24153 Comm: xfs_fsr Tainted: G        W         4.19.0-rc1+ #4182
      [  620.764203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
      [  620.765202] RIP: 0010:assfail+0x20/0x28
      [  620.765646] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.767758] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.768359] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.769174] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.769982] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.770788] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.771638] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.772504] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.773475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.774168] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.774978] Call Trace:
      [  620.775274]  xfs_swap_extent_forks+0x2a0/0x2e0
      [  620.775792]  xfs_swap_extents+0x38b/0xab0
      [  620.776256]  xfs_ioc_swapext+0x121/0x140
      [  620.776709]  xfs_file_ioctl+0x328/0xc90
      [  620.777154]  ? rcu_read_lock_sched_held+0x50/0x60
      [  620.777694]  ? xfs_iunlock+0x233/0x260
      [  620.778127]  ? xfs_setattr_nonsize+0x3be/0x6a0
      [  620.778647]  do_vfs_ioctl+0x9d/0x680
      [  620.779071]  ? ksys_fchown+0x47/0x80
      [  620.779552]  ksys_ioctl+0x35/0x70
      [  620.780040]  __x64_sys_ioctl+0x11/0x20
      [  620.780530]  do_syscall_64+0x4b/0x190
      [  620.780927]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  620.781467] RIP: 0033:0x7fdc364d0f07
      [  620.781900] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 28
      [  620.784044] RSP: 002b:00007ffe2a766038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  620.784896] RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fdc364d0f07
      [  620.785667] RDX: 0000560296ca2fc0 RSI: 00000000c0c0586d RDI: 0000000000000005
      [  620.786398] RBP: 0000000000000025 R08: 0000000000001200 R09: 0000000000000000
      [  620.787283] R10: 0000000000000432 R11: 0000000000000246 R12: 0000000000000005
      [  620.788051] R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000006
      [  620.788927] Modules linked in:
      [  620.789340] ---[ end trace 9503b7417ffdbdb0 ]---
      [  620.790065] RIP: 0010:assfail+0x20/0x28
      [  620.790642] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.793038] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.793609] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.794317] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.795025] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.795778] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.796675] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.797782] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.798908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.799594] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.800424] Kernel panic - not syncing: Fatal exception
      [  620.801191] Kernel Offset: disabled
      [  620.801597] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a585ac0e
    • Carlos Maiolino's avatar
      xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat · 62c7c0a8
      Carlos Maiolino authored
      commit 41657e55 upstream.
      
      The addition of FIBT, RMAP and REFCOUNT changed the offsets into
      __xfssats structure.
      
      This caused xqmstat_proc_show() to display garbage data via
      /proc/fs/xfs/xqmstat, once it relies on the offsets marked via macros.
      
      Fix it.
      
      Fixes: 00f4e4f9 xfs: add rmap btree stats infrastructure
      Fixes: aafc3c24 xfs: support the XFS_BTNUM_FINOBT free inode btree type
      Fixes: 46eeb521 xfs: introduce refcount btree definitions
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62c7c0a8
    • Du Changbin's avatar
      scripts/gdb: fix lx-version string output · aacb2ab1
      Du Changbin authored
      [ Upstream commit b058809b ]
      
      A bug is present in GDB which causes early string termination when
      parsing variables.  This has been reported [0], but we should ensure
      that we can support at least basic printing of the core kernel strings.
      
      For current gdb version (has been tested with 7.3 and 8.1), 'lx-version'
      only prints one character.
      
        (gdb) lx-version
        L(gdb)
      
      This can be fixed by casting 'linux_banner' as (char *).
      
        (gdb) lx-version
        Linux version 4.19.0-rc1+ (changbin@acer) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #21 SMP Sat Sep 1 21:43:30 CST 2018
      
      [0] https://sourceware.org/bugzilla/show_bug.cgi?id=20077
      
      [kbingham@kernel.org: add detail to commit message]
      Link: http://lkml.kernel.org/r/20181111162035.8356-1-kieran.bingham@ideasonboard.com
      Fixes: 2d061d99 ("scripts/gdb: add version command")
      Signed-off-by: default avatarDu Changbin <changbin.du@gmail.com>
      Signed-off-by: default avatarKieran Bingham <kbingham@kernel.org>
      Acked-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aacb2ab1
    • Anders Roxell's avatar
      kernel/kcov.c: mark write_comp_data() as notrace · 58e57bcb
      Anders Roxell authored
      [ Upstream commit 63472443 ]
      
      Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
      function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
      traceable either.  ftrace_graph_caller() gets called every time func
      write_comp_data() gets called if it isn't marked 'notrace'.  This is the
      backtrace from gdb:
      
       #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
       #1  0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
       #2  0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116
       #3  0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188
       #4  0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27
       #5  0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
      
      Rework so that write_comp_data() that are called from
      __sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.
      
      Commit 903e8ff8 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace")
      missed to mark write_comp_data() as 'notrace'. When that patch was
      created gcc-7 was used. In lib/Kconfig.debug
      config KCOV_ENABLE_COMPARISONS
      	depends on $(cc-option,-fsanitize-coverage=trace-cmp)
      
      That code path isn't hit with gcc-7. However, it were that with gcc-8.
      
      Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.orgSigned-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Co-developed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58e57bcb
    • Oleg Nesterov's avatar
      exec: load_script: don't blindly truncate shebang string · ab5f7407
      Oleg Nesterov authored
      [ Upstream commit 8099b047 ]
      
      load_script() simply truncates bprm->buf and this is very wrong if the
      length of shebang string exceeds BINPRM_BUF_SIZE-2.  This can silently
      truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
      happens to be the valid executable path.
      
      Change load_script() to return ENOEXEC if it can't find '\n' or zero in
      bprm->buf.  Note that '\0' can come from either
      prepare_binprm()->memset() or from kernel_read(), we do not care.
      
      Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.comSigned-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Ben Woodard <woodard@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ab5f7407
    • Davidlohr Bueso's avatar
      fs/epoll: drop ovflist branch prediction · 9cb8f808
      Davidlohr Bueso authored
      [ Upstream commit 76699a67 ]
      
      The ep->ovflist is a secondary ready-list to temporarily store events
      that might occur when doing sproc without holding the ep->wq.lock.  This
      accounts for every time we check for ready events and also send events
      back to userspace; both callbacks, particularly the latter because of
      copy_to_user, can account for a non-trivial time.
      
      As such, the unlikely() check to see if the pointer is being used, seems
      both misleading and sub-optimal.  In fact, we go to an awful lot of
      trouble to sync both lists, and populating the ovflist is far from an
      uncommon scenario.
      
      For example, profiling a concurrent epoll_wait(2) benchmark, with
      CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
      incorrect rate was seen; and when incrementally increasing the number of
      epoll instances (which is used, for example for multiple queuing load
      balancing models), up to a 90% incorrect rate was seen.
      
      Similarly, by deleting the prediction, 3% throughput boost was seen
      across incremental threads.
      
      Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9cb8f808
    • Liu, Chuansheng's avatar
      kernel/hung_task.c: force console verbose before panic · f0d32c54
      Liu, Chuansheng authored
      [ Upstream commit 168e06f7 ]
      
      Based on commit 401c636a ("kernel/hung_task.c: show all hung tasks
      before panic"), we could get the call stack of hung task.
      
      However, if the console loglevel is not high, we still can not see the
      useful panic information in practice, and in most cases users don't set
      console loglevel to high level.
      
      This patch is to force console verbose before system panic, so that the
      real useful information can be seen in the console, instead of being
      like the following, which doesn't have hung task information.
      
        INFO: task init:1 blocked for more than 120 seconds.
              Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        Kernel panic - not syncing: hung_task: blocked tasks
        CPU: 2 PID: 479 Comm: khungtaskd Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
        Call Trace:
         dump_stack+0x4f/0x65
         panic+0xde/0x231
         watchdog+0x290/0x410
         kthread+0x12c/0x150
         ret_from_fork+0x35/0x40
        reboot: panic mode set: p,w
        Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.comSigned-off-by: default avatarChuansheng Liu <chuansheng.liu@intel.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f0d32c54
    • Cheng Lin's avatar
      proc/sysctl: fix return error for proc_doulongvec_minmax() · 9beb84c0
      Cheng Lin authored
      [ Upstream commit 09be1784 ]
      
      If the number of input parameters is less than the total parameters, an
      EINVAL error will be returned.
      
      For example, we use proc_doulongvec_minmax to pass up to two parameters
      with kern_table:
      
      {
      	.procname       = "monitor_signals",
      	.data           = &monitor_sigs,
      	.maxlen         = 2*sizeof(unsigned long),
      	.mode           = 0644,
      	.proc_handler   = proc_doulongvec_minmax,
      },
      
      Reproduce:
      
      When passing two parameters, it's work normal.  But passing only one
      parameter, an error "Invalid argument"(EINVAL) is returned.
      
        [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
        [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
        1       2
        [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
        -bash: echo: write error: Invalid argument
        [root@cl150 ~]# echo $?
        1
        [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
        3       2
        [root@cl150 ~]#
      
      The following is the result after apply this patch.  No error is
      returned when the number of input parameters is less than the total
      parameters.
      
        [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
        [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
        1       2
        [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
        [root@cl150 ~]# echo $?
        0
        [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
        3       2
        [root@cl150 ~]#
      
      There are three processing functions dealing with digital parameters,
      __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.
      
      This patch deals with __do_proc_doulongvec_minmax, just as
      __do_proc_dointvec does, adding a check for parameters 'left'.  In
      __do_proc_douintvec, its code implementation explicitly does not support
      multiple inputs.
      
      static int __do_proc_douintvec(...){
               ...
               /*
                * Arrays are not supported, keep this simple. *Do not* add
                * support for them.
                */
               if (vleft != 1) {
                       *lenp = 0;
                       return -EINVAL;
               }
               ...
      }
      
      So, just __do_proc_doulongvec_minmax has the problem.  And most use of
      proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
      parameter.
      
      Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cnSigned-off-by: default avatarCheng Lin <cheng.lin130@zte.com.cn>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9beb84c0
    • Tetsuo Handa's avatar
      kernel/hung_task.c: break RCU locks based on jiffies · 9c8939b0
      Tetsuo Handa authored
      [ Upstream commit 304ae427 ]
      
      check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
      for every 1024 threads.  But check_hung_task() is very slow if printk()
      was called, and is very fast otherwise.
      
      If many threads within some 1024 threads called printk(), the RCU grace
      period might be extended enough to trigger RCU stall warnings.
      Therefore, calling rcu_lock_break() for every some fixed jiffies will be
      safer.
      
      Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9c8939b0
    • Dave Martin's avatar
      arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition · d69ad39a
      Dave Martin authored
      [ Upstream commit ee1b465b ]
      
      SVE_PT_REGS_OFFSET is supposed to indicate the offset for skipping
      over the ptrace NT_ARM_SVE header (struct user_sve_header) to the
      start of the SVE register data proper.
      
      However, currently SVE_PT_REGS_OFFSET is defined in terms of struct
      sve_context, which is wrong: that structure describes the SVE
      header in the signal frame, not in the ptrace regset.
      
      This patch fixes the definition to use the ptrace header structure
      struct user_sve_header instead.
      
      By good fortune, the two structures are the same size anyway, so
      there is no functional or ABI change.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d69ad39a
    • Aditya Pakki's avatar
      HID: lenovo: Add checks to fix of_led_classdev_register · d921bb16
      Aditya Pakki authored
      [ Upstream commit 6ae16dfb ]
      
      In lenovo_probe_tpkbd(), the function of_led_classdev_register() could
      return an error value that is unchecked. The fix adds these checks.
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d921bb16
    • Bjorn Andersson's avatar
      thermal: generic-adc: Fix adc to temp interpolation · ec8f73c2
      Bjorn Andersson authored
      [ Upstream commit 9d216211 ]
      
      First correct the edge case to return the last element if we're
      outside the range, rather than at the last element, so that
      interpolation is not omitted for points between the two last entries in
      the table.
      
      Then correct the formula to perform linear interpolation based the two
      points surrounding the read ADC value. The indices for temp are kept as
      "hi" and "lo" to pair with the adc indices, but there's no requirement
      that the temperature is provided in descendent order. mult_frac() is
      used to prevent issues with overflowing the int.
      
      Cc: Laxman Dewangan <ldewangan@nvidia.com>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ec8f73c2
    • Richard Zhu's avatar
      PCI: imx: Enable MSI from downstream components · 89c18358
      Richard Zhu authored
      [ Upstream commit 75cb8d20 ]
      
      The MSI Enable bit in the MSI Capability (PCIe r4.0, sec 7.7.1.2) controls
      whether a Function can request service using MSI.
      
      i.MX6 Root Ports implement the MSI Capability and may use MSI to request
      service for events like PME, hotplug, AER, etc.  In addition, on i.MX6, the
      MSI Enable bit controls delivery of MSI interrupts from components below
      the Root Port.
      
      Prior to f3fdfc4a ("PCI: Remove host driver Kconfig selection of
      CONFIG_PCIEPORTBUS"), enabling CONFIG_PCI_IMX6 automatically also enabled
      CONFIG_PCIEPORTBUS, and when portdrv claimed the Root Ports, it set the MSI
      Enable bit so it could use PME, hotplug, AER, etc.  As a side effect, that
      also enabled delivery of MSI interrupts from downstream components.
      
      The imx6q-pcie driver itself does not depend on portdrv, so set MSI Enable
      in imx6q-pcie so MSI from downstream components works even if nobody uses
      MSI for the Root Port events.
      
      Fixes: f3fdfc4a ("PCI: Remove host driver Kconfig selection of CONFIG_PCIEPORTBUS")
      Signed-off-by: default avatarRichard Zhu <hongxing.zhu@nxp.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Tested-by: default avatarSven Van Asbroeck <TheSven73@googlemail.com>
      Tested-by: default avatarTrent Piepho <tpiepho@impinj.com>
      Reviewed-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      89c18358
    • Douglas Anderson's avatar
      kdb: Don't back trace on a cpu that didn't round up · 3818c29a
      Douglas Anderson authored
      [ Upstream commit 162bc7f5 ]
      
      If you have a CPU that fails to round up and then run 'btc' you'll end
      up crashing in kdb becaue we dereferenced NULL.  Let's add a check.
      It's wise to also set the task to NULL when leaving the debugger so
      that if we fail to round up on a later entry into the debugger we
      won't backtrace a stale task.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Acked-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3818c29a
    • Matthias Brugger's avatar
      thermal: bcm2835: enable hwmon explicitly · 6a7c0215
      Matthias Brugger authored
      [ Upstream commit d56c19d0 ]
      
      By defaul of-based thermal driver do not enable hwmon.
      This patch does this explicitly, so that the temperature can be read
      through the common hwmon sysfs.
      Signed-off-by: default avatarMatthias Brugger <mbrugger@suse.com>
      Acked-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a7c0215
    • Finn Thain's avatar
      block/swim3: Fix -EBUSY error when re-opening device after unmount · 295b3e2a
      Finn Thain authored
      [ Upstream commit 296dcc40 ]
      
      When the block device is opened with FMODE_EXCL, ref_count is set to -1.
      This value doesn't get reset when the device is closed which means the
      device cannot be opened again. Fix this by checking for refcount <= 0
      in the release method.
      Reported-and-tested-by: default avatarStan Johnson <userm57@yahoo.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      295b3e2a
    • Scott Wood's avatar
      fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address() · d579abca
      Scott Wood authored
      [ Upstream commit 0d9c9a23 ]
      
      These functions are called from atomic context:
      
      [    9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421
      [    9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip
      [    9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876 #29
      [    9.163130] Call Trace:
      [    9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable)
      [    9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164
      [    9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c
      [    9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198
      [    9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218
      [    9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c
      [    9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4
      [    9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54
      [    9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8
      [    9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0
      [    9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80
      [    9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08
      [    9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0
      [    9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498
      [    9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c
      [    9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c
      [    9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318
      [    9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444
      [    9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54
      [    9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8
      [    9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0
      [    9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c
      Signed-off-by: default avatarScott Wood <oss@buserror.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d579abca
    • Wenwen Wang's avatar
      gdrom: fix a memory leak bug · 711b2e7f
      Wenwen Wang authored
      [ Upstream commit 093c4821 ]
      
      In probe_gdrom(), the buffer pointed by 'gd.cd_info' is allocated through
      kzalloc() and is used to hold the information of the gdrom device. To
      register and unregister the device, the pointer 'gd.cd_info' is passed to
      the functions register_cdrom() and unregister_cdrom(), respectively.
      However, this buffer is not freed after it is used, which can cause a
      memory leak bug.
      
      This patch simply frees the buffer 'gd.cd_info' in exit_gdrom() to fix the
      above issue.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      711b2e7f
    • Jia-Ju Bai's avatar
      isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw() · b67e3130
      Jia-Ju Bai authored
      [ Upstream commit 7418e652 ]
      
      In drivers/isdn/hisax/hfc_pci.c, the functions hfcpci_interrupt() and
      HFCPCI_l1hw() may be concurrently executed.
      
      HFCPCI_l1hw()
        line 1173: if (!cs->tx_skb)
      
      hfcpci_interrupt()
        line 942: spin_lock_irqsave();
        line 1066: dev_kfree_skb_irq(cs->tx_skb);
      
      Thus, a possible concurrency use-after-free bug may occur
      in HFCPCI_l1hw().
      
      To fix these bugs, the calls to spin_lock_irqsave() and
      spin_unlock_irqrestore() are added in HFCPCI_l1hw(), to protect the
      access to cs->tx_skb.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b67e3130
    • Minchan Kim's avatar
      zram: fix lockdep warning of free block handling · 3b3ee499
      Minchan Kim authored
      [ Upstream commit 3c9959e0 ]
      
      Patch series "zram idle page writeback", v3.
      
      Inherently, swap device has many idle pages which are rare touched since
      it was allocated.  It is never problem if we use storage device as swap.
      However, it's just waste for zram-swap.
      
      This patchset supports zram idle page writeback feature.
      
      * Admin can define what is idle page "no access since X time ago"
      * Admin can define when zram should writeback them
      * Admin can define when zram should stop writeback to prevent wearout
      
      Details are in each patch's description.
      
      This patch (of 7):
      
        ================================
        WARNING: inconsistent lock state
        4.19.0+ #390 Not tainted
        --------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
        00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
        {SOFTIRQ-ON-W} state was registered at:
          _raw_spin_lock+0x2c/0x40
          zram_make_request+0x755/0xdc9
          generic_make_request+0x373/0x6a0
          submit_bio+0x6c/0x140
          __swap_writepage+0x3a8/0x480
          shrink_page_list+0x1102/0x1a60
          shrink_inactive_list+0x21b/0x3f0
          shrink_node_memcg.constprop.99+0x4f8/0x7e0
          shrink_node+0x7d/0x2f0
          do_try_to_free_pages+0xe0/0x300
          try_to_free_pages+0x116/0x2b0
          __alloc_pages_slowpath+0x3f4/0xf80
          __alloc_pages_nodemask+0x2a2/0x2f0
          __handle_mm_fault+0x42e/0xb50
          handle_mm_fault+0x55/0xb0
          __do_page_fault+0x235/0x4b0
          page_fault+0x1e/0x30
        irq event stamp: 228412
        hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
        hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
        softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
        softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&zram->bitmap_lock)->rlock);
          <Interrupt>
            lock(&(&zram->bitmap_lock)->rlock);
      
         *** DEADLOCK ***
      
        no locks held by zram_verify/2095.
      
        stack backtrace:
        CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack+0x67/0x9b
         print_usage_bug+0x1bd/0x1d3
         mark_lock+0x4aa/0x540
         __lock_acquire+0x51d/0x1300
         lock_acquire+0x90/0x180
         _raw_spin_lock+0x2c/0x40
         put_entry_bdev+0x1e/0x50
         zram_free_page+0xf6/0x110
         zram_slot_free_notify+0x42/0xa0
         end_swap_bio_read+0x5b/0x170
         blk_update_request+0x8f/0x340
         scsi_end_request+0x2c/0x1e0
         scsi_io_completion+0x98/0x650
         blk_done_softirq+0x9e/0xd0
         __do_softirq+0xcc/0x427
         irq_exit+0xd1/0xe0
         do_IRQ+0x93/0x120
         common_interrupt+0xf/0xf
         </IRQ>
      
      With writeback feature, zram_slot_free_notify could be called in softirq
      context by end_swap_bio_read.  However, bitmap_lock is not aware of that
      so lockdep yell out:
      
        get_entry_bdev
        spin_lock(bitmap->lock);
        irq
        softirq
        end_swap_bio_read
        zram_slot_free_notify
        zram_slot_lock <-- deadlock prone
        zram_free_page
        put_entry_bdev
        spin_lock(bitmap->lock); <-- deadlock prone
      
      With akpm's suggestion (i.e.  bitmap operation is already atomic), we
      could remove bitmap lock.  It might fail to find a empty slot if serious
      contention happens.  However, it's not severe problem because huge page
      writeback has already possiblity to fail if there is severe memory
      pressure.  Worst case is just keeping the incompressible in memory, not
      storage.
      
      The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
      make it safe is this patch introduces zram_slot_trylock where
      zram_slot_free_notify uses it.  Although it's rare to be contented, this
      patch adds new debug stat "miss_free" to keep monitoring how often it
      happens.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: default avatarJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b3ee499
    • Waiman Long's avatar
      mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init · f73c7753
      Waiman Long authored
      [ Upstream commit 3c0c12cc ]
      
      When CONFIG_KASAN is enabled on large memory SMP systems, the deferrred
      pages initialization can take a long time.  Below were the reported init
      times on a 8-socket 96-core 4TB IvyBridge system.
      
        1) Non-debug kernel without CONFIG_KASAN
           [    8.764222] node 1 initialised, 132086516 pages in 7027ms
      
        2) Debug kernel with CONFIG_KASAN
           [  146.288115] node 1 initialised, 132075466 pages in 143052ms
      
      So the page init time in a debug kernel was 20X of the non-debug kernel.
      The long init time can be problematic as the page initialization is done
      with interrupt disabled.  In this particular case, it caused the
      appearance of following warning messages as well as NMI backtraces of all
      the cores that were doing the initialization.
      
      [   68.240049] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
      [   68.241000] rcu: 	25-...0: (100 ticks this GP) idle=b72/1/0x4000000000000000 softirq=915/915 fqs=16252
      [   68.241000] rcu: 	44-...0: (95 ticks this GP) idle=49a/1/0x4000000000000000 softirq=788/788 fqs=16253
      [   68.241000] rcu: 	54-...0: (104 ticks this GP) idle=03a/1/0x4000000000000000 softirq=721/825 fqs=16253
      [   68.241000] rcu: 	60-...0: (103 ticks this GP) idle=cbe/1/0x4000000000000000 softirq=637/740 fqs=16253
      [   68.241000] rcu: 	72-...0: (105 ticks this GP) idle=786/1/0x4000000000000000 softirq=536/641 fqs=16253
      [   68.241000] rcu: 	84-...0: (99 ticks this GP) idle=292/1/0x4000000000000000 softirq=537/537 fqs=16253
      [   68.241000] rcu: 	111-...0: (104 ticks this GP) idle=bde/1/0x4000000000000000 softirq=474/476 fqs=16253
      [   68.241000] rcu: 	(detected by 13, t=65018 jiffies, g=249, q=2)
      
      The long init time was mainly caused by the call to kasan_free_pages() to
      poison the newly initialized pages.  On a 4TB system, we are talking about
      almost 500GB of memory probably on the same node.
      
      In reality, we may not need to poison the newly initialized pages before
      they are ever allocated.  So KASAN poisoning of freed pages before the
      completion of deferred memory initialization is now disabled.  Those pages
      will be properly poisoned when they are allocated or freed after deferred
      pages initialization is done.
      
      With this change, the new page initialization time became:
      
      [   21.948010] node 1 initialised, 132075466 pages in 18702ms
      
      This was still about double the non-debug kernel time, but was much
      better than before.
      
      Link: http://lkml.kernel.org/r/1544459388-8736-1-git-send-email-longman@redhat.comSigned-off-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f73c7753
    • Larry Chen's avatar
      ocfs2: improve ocfs2 Makefile · 066206bc
      Larry Chen authored
      [ Upstream commit 9e6aea22 ]
      
      Included file path was hard-wired in the ocfs2 makefile, which might
      causes some confusion when compiling ocfs2 as an external module.
      
      Say if we compile ocfs2 module as following.
      cp -r /kernel/tree/fs/ocfs2 /other/dir/ocfs2
      cd /other/dir/ocfs2
      make -C /path/to/kernel_source M=`pwd` modules
      
      Acutally, the compiler wil try to find included file in
      /kernel/tree/fs/ocfs2, rather than the directory /other/dir/ocfs2.
      
      To fix this little bug, we introduce the var $(src) provided by kbuild.
      $(src) means the absolute path of the running kbuild file.
      
      Link: http://lkml.kernel.org/r/20181108085546.15149-1-lchen@suse.comSigned-off-by: default avatarLarry Chen <lchen@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      066206bc
    • Junxiao Bi's avatar
      ocfs2: don't clear bh uptodate for block read · 69e63b49
      Junxiao Bi authored
      [ Upstream commit 70306d9d ]
      
      For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
      and submit the io, second wait io done, last check whether bh uptodate, if
      not return io error.
      
      If two sync io for the same bh were issued, it could be the first io done
      and set uptodate flag, but just before check that flag, the second io came
      in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
      will return IO error.
      
      Indeed it's not necessary to clear uptodate flag, as the io end handler
      end_buffer_read_sync() will set or clear it based on io succeed or failed.
      
      The following message was found from a nfs server but the underlying
      storage returned no error.
      
      [4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
      [4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
      [4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
      [4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
      [4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
      
      Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
      
      Link: http://lkml.kernel.org/r/20181121020023.3034-4-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69e63b49
    • Randy Dunlap's avatar
      arch/sh/boards/mach-kfr2r09/setup.c: fix struct mtd_oob_ops build warning · dc8bd7ed
      Randy Dunlap authored
      [ Upstream commit 440e7b37 ]
      
      arch/sh/boards/mach-kfr2r09/setup.c does not need to #include
      <mtd/onenand.h>, and doing so causes a build warning, so drop that header
      file.
      
      In file included from ../arch/sh/boards/mach-kfr2r09/setup.c:28:
      ../include/linux/mtd/onenand.h:225:12: warning: 'struct mtd_oob_ops' declared inside parameter list will not be visible outside of this definition or declaration
           struct mtd_oob_ops *ops);
      
      Link: http://lkml.kernel.org/r/702f0a25-c63e-6912-4640-6ab0f00afbc7@infradead.org
      Fixes: f3590dc3 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Suggested-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Jacopo Mondi <jacopo+renesas@jmondi.org>
      Cc: Magnus Damm <magnus.damm@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc8bd7ed
    • Marc Zyngier's avatar
      scripts/decode_stacktrace: only strip base path when a prefix of the path · fa3c7c09
      Marc Zyngier authored
      [ Upstream commit 67a28de4 ]
      
      Running something like:
      
      	decodecode vmlinux .
      
      leads to interested results where not only the leading "." gets stripped
      from the displayed paths, but also anywhere in the string, displaying
      something like:
      
      	kvm_vcpu_check_block (arch/arm64/kvm/virt/kvm/kvm_mainc:2141)
      
      which doesn't help further processing.
      
      Fix it by only stripping the base path if it is a prefix of the path.
      
      Link: http://lkml.kernel.org/r/20181210174659.31054-3-marc.zyngier@arm.comSigned-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fa3c7c09
    • Jiri Olsa's avatar
      perf python: Do not force closing original perf descriptor in evlist.get_pollfd() · 395cbb9a
      Jiri Olsa authored
      [ Upstream commit a389aece ]
      
      Ondřej reported that when compiled with python3, the python extension
      regresses in evlist.get_pollfd function behaviour.
      
      The evlist.get_pollfd function creates file objects from evlist's fds
      and returns them in a list. The python3 version also sets them to 'close
      the original descriptor' when the object dies (is closed), by passing
      True via the 'closefd' arg in the PyFile_FromFd call.
      
      The python's closefd doc says:
      
        If closefd is False, the underlying file descriptor will be kept open
        when the file is closed.
      
      That's why the following line in python3 closes all evlist fds:
      
        evlist.get_pollfd()
      
      the returned list is immediately destroyed and that takes down the
      original events fds.
      
      Passing closefd as False to PyFile_FromFd to fix this.
      Reported-by: default avatarOndřej Lysoněk <olysonek@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jaroslav Škarvada <jskarvad@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 66dfdff0 ("perf tools: Add Python 3 support")
      Link: http://lkml.kernel.org/r/20181226112121.5285-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      395cbb9a
    • Ondrej Mosnacek's avatar
      cgroup: fix parsing empty mount option string · 4b5abffd
      Ondrej Mosnacek authored
      [ Upstream commit e250d91d ]
      
      This fixes the case where all mount options specified are consumed by an
      LSM and all that's left is an empty string. In this case cgroupfs should
      accept the string and not fail.
      
      How to reproduce (with SELinux enabled):
      
          # umount /sys/fs/cgroup/unified
          # mount -o context=system_u:object_r:cgroup_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified
          mount: /sys/fs/cgroup/unified: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error.
          # dmesg | tail -n 1
          [   31.575952] cgroup: cgroup2: unknown option ""
      
      Fixes: 67e9c74b ("cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type")
      [NOTE: should apply on top of commit 5136f636 ("cgroup: implement "nsdelegate" mount option"), older versions need manual rebase]
      Suggested-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b5abffd
    • Sahitya Tummala's avatar
      f2fs: fix sbi->extent_list corruption issue · df13b036
      Sahitya Tummala authored
      [ Upstream commit e4589fa5 ]
      
      When there is a failure in f2fs_fill_super() after/during
      the recovery of fsync'd nodes, it frees the current sbi and
      retries again. This time the mount is successful, but the files
      that got recovered before retry, still holds the extent tree,
      whose extent nodes list is corrupted since sbi and sbi->extent_list
      is freed up. The list_del corruption issue is observed when the
      file system is getting unmounted and when those recoverd files extent
      node is being freed up in the below context.
      
      list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
      <...>
      kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
      lr : __list_del_entry_valid+0x94/0xb4
      pc : __list_del_entry_valid+0x94/0xb4
      <...>
      Call trace:
      __list_del_entry_valid+0x94/0xb4
      __release_extent_node+0xb0/0x114
      __free_extent_tree+0x58/0x7c
      f2fs_shrink_extent_tree+0xdc/0x3b0
      f2fs_leave_shrinker+0x28/0x7c
      f2fs_put_super+0xfc/0x1e0
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      __cleanup_mnt+0x1c/0x28
      task_work_run+0x48/0xd0
      do_notify_resume+0x678/0xe98
      work_pending+0x8/0x14
      
      Fix this by not creating extents for those recovered files if shrinker is
      not registered yet. Once mount is successful and shrinker is registered,
      those files can have extents again.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df13b036
    • Kangjie Lu's avatar
      niu: fix missing checks of niu_pci_eeprom_read · 4d6b5b08
      Kangjie Lu authored
      [ Upstream commit 26fd962b ]
      
      niu_pci_eeprom_read() may fail, so we should check its return value
      before using the read data.
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Acked-by: default avatarShannon Nelson <shannon.lee.nelson@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4d6b5b08
    • Anton Ivanov's avatar
      um: Avoid marking pages with "changed protection" · 1e480ee6
      Anton Ivanov authored
      [ Upstream commit 8892d854 ]
      
      Changing protection is a very high cost operation in UML
      because in addition to an extra syscall it also interrupts
      mmap merge sequences generated by the tlb.
      
      While the condition is not particularly common it is worth
      avoiding.
      Signed-off-by: default avatarAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1e480ee6
    • Sahitya Tummala's avatar
      f2fs: fix use-after-free issue when accessing sbi->stat_info · 69e7f877
      Sahitya Tummala authored
      [ Upstream commit 60aa4d55 ]
      
      iput() on sbi->node_inode can update sbi->stat_info
      in the below context, if the f2fs_write_checkpoint()
      has failed with error.
      
      f2fs_balance_fs_bg+0x1ac/0x1ec
      f2fs_write_node_pages+0x4c/0x260
      do_writepages+0x80/0xbc
      __writeback_single_inode+0xdc/0x4ac
      writeback_single_inode+0x9c/0x144
      write_inode_now+0xc4/0xec
      iput+0x194/0x22c
      f2fs_put_super+0x11c/0x1e8
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      
      Fix this by moving f2fs_destroy_stats() further below iput() in
      both f2fs_put_super() and f2fs_fill_super() paths.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69e7f877
    • Ronnie Sahlberg's avatar
      cifs: check ntwrk_buf_start for NULL before dereferencing it · 5d3b4cd8
      Ronnie Sahlberg authored
      [ Upstream commit 59a63e47 ]
      
      RHBZ: 1021460
      
      There is an issue where when multiple threads open/close the same directory
      ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize
      later to oops with a NULL deref.
      
      The real bug is why this happens and why this can become NULL for an
      open cfile, which should not be allowed.
      This patch tries to avoid a oops until the time when we fix the underlying
      issue.
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5d3b4cd8
    • Stefan Roese's avatar
      MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8 · 2cdf5246
      Stefan Roese authored
      [ Upstream commit 0b153944 ]
      
      Testing has shown, that when using mainline U-Boot on MT7688 based
      boards, the system may hang or crash while mounting the root-fs. The
      main issue here is that mainline U-Boot configures EBase to a value
      near the end of system memory. And with CONFIG_CPU_MIPSR2_IRQ_VI
      disabled, trap_init() will not allocate a new area to place the
      exception handler. The original value will be used and the handler
      will be copied to this location, which might already be used by some
      userspace application.
      
      The MT7688 supports VI - its config3 register is 0x00002420, so VInt
      (Bit 5) is set. But without setting CONFIG_CPU_MIPSR2_IRQ_VI this
      bit will not be evaluated to result in "cpu_has_vi" being set. This
      patch now selects CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8 which results
      trap_init() to allocate some memory for the exception handler.
      
      Please note that this issue was not seen with the Mediatek U-Boot
      version, as it does not touch EBase (stays at default of 0x8000.0000).
      This is strictly also not correct as the kernel (_text) resides
      here.
      Signed-off-by: default avatarStefan Roese <sr@denx.de>
      [paul.burton@mips.com: s/beeing/being/]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Daniel Schwierzeck <daniel.schwierzeck@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2cdf5246
    • Nathan Chancellor's avatar
      crypto: ux500 - Use proper enum in hash_set_dma_transfer · 1dc571ff
      Nathan Chancellor authored
      [ Upstream commit 5ac93f80 ]
      
      Clang warns when one enumerated type is implicitly converted to another:
      
      drivers/crypto/ux500/hash/hash_core.c:169:4: warning: implicit
      conversion from enumeration type 'enum dma_data_direction' to different
      enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                              direction, DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
                              ^~~~~~~~~
      1 warning generated.
      
      dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
      We know that the only direction supported by this function is
      DMA_TO_DEVICE because of the check at the top of this function so we can
      just use the equivalent value from dma_transfer_direction.
      
      DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1dc571ff
    • Nathan Chancellor's avatar
      crypto: ux500 - Use proper enum in cryp_set_dma_transfer · 2b020c09
      Nathan Chancellor authored
      [ Upstream commit 9d880c59 ]
      
      Clang warns when one enumerated type is implicitly converted to another:
      
      drivers/crypto/ux500/cryp/cryp_core.c:559:5: warning: implicit
      conversion from enumeration type 'enum dma_data_direction' to different
      enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                      direction, DMA_CTRL_ACK);
                                      ^~~~~~~~~
      drivers/crypto/ux500/cryp/cryp_core.c:583:5: warning: implicit
      conversion from enumeration type 'enum dma_data_direction' to different
      enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                      direction,
                                      ^~~~~~~~~
      2 warnings generated.
      
      dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
      Because we know the value of the dma_data_direction enum from the
      switch statement, we can just use the proper value from
      dma_transfer_direction so there is no more conversion.
      
      DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1
      DMA_FROM_DEVICE = DMA_DEV_TO_MEM = 2
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2b020c09
    • Michael Ellerman's avatar
      seq_buf: Make seq_buf_puts() null-terminate the buffer · 6201e8ad
      Michael Ellerman authored
      [ Upstream commit 0464ed24 ]
      
      Currently seq_buf_puts() will happily create a non null-terminated
      string for you in the buffer. This is particularly dangerous if the
      buffer is on the stack.
      
      For example:
      
        char buf[8];
        char secret = "secret";
        struct seq_buf s;
      
        seq_buf_init(&s, buf, sizeof(buf));
        seq_buf_puts(&s, "foo");
        printk("Message is %s\n", buf);
      
      Can result in:
      
        Message is fooªªªªªsecret
      
      We could require all users to memset() their buffer to zero before
      use. But that seems likely to be forgotten and lead to bugs.
      
      Instead we can change seq_buf_puts() to always leave the buffer in a
      null-terminated state.
      
      The only downside is that this makes the buffer 1 character smaller
      for seq_buf_puts(), but that seems like a good trade off.
      
      Link: http://lkml.kernel.org/r/20181019042109.8064-1-mpe@ellerman.id.auAcked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6201e8ad
    • Kangjie Lu's avatar
      hwmon: (lm80) fix a missing check of bus read in lm80 probe · 040f7697
      Kangjie Lu authored
      [ Upstream commit 9aa3aa15 ]
      
      In lm80_probe(), if lm80_read_value() fails, it returns a negative
      error number which is stored to data->fan[f_min] and will be further
      used. We should avoid using the data if the read fails.
      
      The fix checks if lm80_read_value() fails, and if so, returns with the
      error number.
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      040f7697