1. 04 May, 2017 2 commits
  2. 02 May, 2017 1 commit
    • Eric Biggers's avatar
      ext4: inherit encryption xattr before other xattrs · aa1dca3b
      Eric Biggers authored
      When using both encryption and SELinux (or another feature that requires
      an xattr per file) on a filesystem with 256-byte inodes, each file's
      xattrs usually spill into an external xattr block.  Currently, the
      xattrs are inherited in the order ACL, security, then encryption.
      Therefore, if spillage occurs, the encryption xattr will always end up
      in the external block.  This is not ideal because the encryption xattrs
      contain a nonce, so they will always be unique and will prevent the
      external xattr blocks from being deduplicated.
      
      To improve the situation, change the inheritance order to encryption,
      ACL, then security.  This gives the encryption xattr a better chance to
      be stored in-inode, allowing the other xattr(s) to be deduplicated.
      
      Note that it may be better for userspace to format the filesystem with
      512-byte inodes in this case.  However, it's not the default.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      aa1dca3b
  3. 01 May, 2017 1 commit
  4. 30 Apr, 2017 13 commits
    • Jan Kara's avatar
      ext4: avoid unnecessary transaction stalls during writeback · dddbd6ac
      Jan Kara authored
      Currently ext4_writepages() submits all pages with transaction started.
      When no page needs block allocation or extent conversion we can submit
      all dirty pages in the inode while holding a single transaction handle
      and when device is congested this can take significant amount of time.
      Thus ext4_writepages() can block transaction commits for extended
      periods of time.
      
      Take for example a simple benchmark simulating PostgreSQL database
      (pgioperf in mmtest). The benchmark runs 16 processes doing random reads
      from a huge file, one process doing random writes to the huge file, and
      one process doing sequential writes to a small files and frequently
      running fsync. With unpatched kernel transaction commits take on average
      ~18s with standard deviation of ~41s, top 5 commit times are:
      
      274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.
      
      After this patch transaction commits take on average 0.1s with standard
      deviation of 0.15s, top 5 commit times are:
      
      0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s
      
      [ Modified so we use an explicit do_map flag instead of relying on
        io_end not being allocated, the since io_end->inode is needed for I/O
        error handling. -- tytso ]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      dddbd6ac
    • Andrew Perepechko's avatar
      ext4: preload block group descriptors · 85c8f176
      Andrew Perepechko authored
      With enabled meta_bg option block group descriptors
      reading IO is not sequential and requires optimization.
      Signed-off-by: default avatarAndrew Perepechko <andrew.perepechko@seagate.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      85c8f176
    • Eric Biggers's avatar
      ext4: make ext4_shutdown() static · 1a20a630
      Eric Biggers authored
      Make the ext4_shutdown() function static, as suggested by running sparse
      ('make C=2 fs/ext4/').  This was the only such warning in fs/ext4/.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1a20a630
    • Darrick J. Wong's avatar
      ext4: support GETFSMAP ioctls · 0c9ec4be
      Darrick J. Wong authored
      Support the GETFSMAP ioctls so that we can use the xfs free space
      management tools to probe ext4 as well.  Note that this is a partial
      implementation -- we only report fixed-location metadata and free space;
      everything else is reported as "unknown".
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0c9ec4be
    • Darrick J. Wong's avatar
      vfs: add common GETFSMAP ioctl definitions · d0649f04
      Darrick J. Wong authored
      Add the GETFSMAP headers to the VFS kernel headers
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d0649f04
    • Eric Biggers's avatar
      ext4: evict inline data when writing to memory map · 7b4cc978
      Eric Biggers authored
      Currently the case of writing via mmap to a file with inline data is not
      handled.  This is maybe a rare case since it requires a writable memory
      map of a very small file, but it is trivial to trigger with on
      inline_data filesystem, and it causes the
      'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in
      ext4_writepages() to be hit:
      
          mkfs.ext4 -O inline_data /dev/vdb
          mount /dev/vdb /mnt
          xfs_io -f /mnt/file \
      	-c 'pwrite 0 1' \
      	-c 'mmap -w 0 1m' \
      	-c 'mwrite 0 1' \
      	-c 'fsync'
      
      	kernel BUG at fs/ext4/inode.c:2723!
      	invalid opcode: 0000 [#1] SMP
      	CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      	task: ffff88003d3a8040 task.stack: ffffc90000300000
      	RIP: 0010:ext4_writepages+0xc89/0xf8a
      	RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283
      	RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc
      	RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246
      	RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5
      	R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698
      	R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff
      	FS:  00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0
      	Call Trace:
      	 ? _raw_spin_unlock+0x27/0x2a
      	 ? kvm_clock_read+0x1e/0x20
      	 do_writepages+0x23/0x2c
      	 ? do_writepages+0x23/0x2c
      	 __filemap_fdatawrite_range+0x80/0x87
      	 filemap_write_and_wait_range+0x67/0x8c
      	 ext4_sync_file+0x20e/0x472
      	 vfs_fsync_range+0x8e/0x9f
      	 ? syscall_trace_enter+0x25b/0x2d0
      	 vfs_fsync+0x1c/0x1e
      	 do_fsync+0x31/0x4a
      	 SyS_fsync+0x10/0x14
      	 do_syscall_64+0x69/0x131
      	 entry_SYSCALL64_slow_path+0x25/0x25
      
      We could try to be smart and keep the inline data in this case, or at
      least support delayed allocation when allocating the block, but these
      solutions would be more complicated and don't seem worthwhile given how
      rare this case seems to be.  So just fix the bug by calling
      ext4_convert_inline_data() when we're asked to make a page writable, so
      that any inline data gets evicted, with the block allocated immediately.
      Reported-by: default avatarNick Alcock <nick.alcock@oracle.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7b4cc978
    • Eric Biggers's avatar
      ext4: remove ext4_xattr_check_entry() · 6ba644b9
      Eric Biggers authored
      ext4_xattr_check_entry() was redundant with validation of the full xattr
      entries list in ext4_xattr_check_entries(), which all callers also did.
      ext4_xattr_check_entry() also didn't actually do correct validation;
      specifically, it never checked that the value doesn't overlap the xattr
      names, nor did it account for padding when checking whether the xattr
      value overflows the available space.  So remove it to eliminate any
      potential confusion.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6ba644b9
    • Eric Biggers's avatar
      ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries() · 2c4f9923
      Eric Biggers authored
      ext4_xattr_check_names() actually validates both the xattr names and
      values, not just the names.  So rename it to ext4_xattr_check_entries()
      to avoid confusion.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      2c4f9923
    • Eric Biggers's avatar
      ext4: merge ext4_xattr_list() into ext4_listxattr() · ba7ea1d8
      Eric Biggers authored
      There's no difference between ext4_xattr_list() and ext4_listxattr(), so
      merge them together and just have ext4_listxattr().  Some years ago they
      took different arguments, but that's no longer the case.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ba7ea1d8
    • Eric Biggers's avatar
      ext4: constify static data that is never modified · d6006186
      Eric Biggers authored
      Constify static data in ext4 that is never (intentionally) modified so
      that it is placed in .rodata and benefits from memory protection.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d6006186
    • Eric Biggers's avatar
      ext4: trim return value and 'dir' argument from ext4_insert_dentry() · 1bc0af60
      Eric Biggers authored
      In the initial implementation of ext4 encryption, the filename was
      encrypted in ext4_insert_dentry(), which could fail and also required
      access to the 'dir' inode.  Since then ext4 filename encryption has been
      changed to encrypt the filename earlier, so we can revert the additions
      to ext4_insert_dentry().
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1bc0af60
    • Jan Kara's avatar
      jbd2: fix dbench4 performance regression for 'nobarrier' mounts · 5052b069
      Jan Kara authored
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
      JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
      filesystem is mounted with nobarrier mount option, journal superblock
      writes ended up being async writes after this patch and that caused
      heavy performance regression for dbench4 benchmark with high number of
      processes. In my test setup with HP RAID array with non-volatile write
      cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.
      
      Fix the problem by making sure journal superblock writes are always
      treated as synchronous since they generally block progress of the
      journalling machinery and thus the whole filesystem.
      
      Fixes: b685d3d6
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5052b069
    • Jan Kara's avatar
      jbd2: Fix lockdep splat with generic/270 test · c52c47e4
      Jan Kara authored
      I've hit a lockdep splat with generic/270 test complaining that:
      
      3216.fsstress.b/3533 is trying to acquire lock:
       (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150
      
      but task is already holding lock:
       (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850
      
      The underlying problem is that jbd2_journal_force_commit_nested()
      (called from ext4_should_retry_alloc()) may get called while a
      transaction handle is started. In such case it takes care to not wait
      for commit of the running transaction (which would deadlock) but only
      for a commit of a transaction that is already committing (which is safe
      as that doesn't wait for any filesystem locks).
      
      In fact there are also other callers of jbd2_log_wait_commit() that take
      care to pass tid of a transaction that is already committing and for
      those cases, the lockdep instrumentation is too restrictive and leading
      to false positive reports. Fix the problem by calling
      jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
      transaction isn't already committing.
      
      Fixes: 1eaa566dSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c52c47e4
  5. 28 Apr, 2017 1 commit
    • Theodore Ts'o's avatar
      mm: retry writepages() on ENOMEM when doing an data integrity writeback · 80a2ea9f
      Theodore Ts'o authored
      Currently, file system's writepages() function must not fail with an
      ENOMEM, since if they do, it's possible for buffered data to be lost.
      This is because on a data integrity writeback writepages() gets called
      but once, and if it returns ENOMEM, if you're lucky the error will get
      reflected back to the userspace process calling fsync().  If you
      aren't lucky, the user is unmounting the file system, and the dirty
      pages will simply be lost.
      
      For this reason, file system code generally will use GFP_NOFS, and in
      some cases, will retry the allocation in a loop, on the theory that
      "kernel livelocks are temporary; data loss is forever".
      Unfortunately, this can indeed cause livelocks, since inside the
      writepages() call, the file system is holding various mutexes, and
      these mutexes may prevent the OOM killer from killing its targetted
      victim if it is also holding on to those mutexes.
      
      A better solution would be to allow writepages() to call the memory
      allocator with flags that give greater latitude to the allocator to
      fail, and then release its locks and return ENOMEM, and in the case of
      background writeback, the writes can be retried at a later time.  In
      the case of data-integrity writeback retry after waiting a brief
      amount of time.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      80a2ea9f
  6. 09 Apr, 2017 6 commits
    • Linus Torvalds's avatar
      Linux 4.11-rc6 · 39da7c50
      Linus Torvalds authored
      39da7c50
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 84ced7fd
      Linus Torvalds authored
      Pull CIFS fixes from Steve French:
       "This is a set of CIFS/SMB3 fixes for stable.
      
        There is another set of four SMB3 reconnect fixes for stable in
        progress but they are still being reviewed/tested, so didn't want to
        wait any longer to send these five below"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        Reset TreeId to zero on SMB2 TREE_CONNECT
        CIFS: Fix build failure with smb2
        Introduce cifs_copy_file_range()
        SMB3: Rename clone_range to copychunk_range
        Handle mismatched open calls
      84ced7fd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 462e9a35
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "A number of ARM fixes:
      
         - prevent oopses caused by dma_get_sgtable() and declared DMA
           coherent memory
      
         - fix boot failure on nommu caused by ID_PFR1 access
      
         - a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
        ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
        arm: kprobes: Align stack to 8-bytes in test code
        arm: kprobes: Fix the return address of multiple kretprobes
        arm: kprobes: Skip single-stepping in recursing path if possible
        arm: kprobes: Allow to handle reentered kprobe on single-stepping
      462e9a35
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.11-rc6' of... · 5b50be74
      Linus Torvalds authored
      Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are 3 small fixes for 4.11-rc6.
      
        One resolves a reported issue with sysfs files that NeilBrown found,
        one is a documenatation fix for the stable kernel rules, and the last
        is a small MAINTAINERS file update for kernfs"
      
      * tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        MAINTAINERS: separate out kernfs maintainership
        sysfs: be careful of error returns from ops->show()
        Documentation: stable-kernel-rules: fix stable-tag format
      5b50be74
    • Linus Torvalds's avatar
      Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 62e1fd08
      Linus Torvalds authored
      Pull staging/IIO driver rfixes from Greg KH:
       "Here are a number of small IIO and staging driver fixes for 4.11-rc6.
        Nothing big here, just iio fixes for reported issues, and an ashmem
        fix for a very old bug that has been reported by a number of Android
        vendors"
      
      * tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
        iio: hid-sensor-attributes: Fix sensor property setting failure.
        iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
        iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
        iio: st_pressure: initialize lps22hb bootime
        iio: bmg160: reset chip when probing
        iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
      62e1fd08
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 2a610b8a
      Linus Torvalds authored
      Pull VFS fixes from Al Viro:
       "statx followup fixes and a fix for stack-smashing on alpha"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        alpha: fix stack smashing in old_adjtimex(2)
        statx: Include a mask for stx_attributes in struct statx
        statx: Reserve the top bit of the mask for future struct expansion
        xfs: report crtime and attribute flags to statx
        ext4: Add statx support
        statx: optimize copy of struct statx to userspace
        statx: remove incorrect part of vfs_statx() comment
        statx: reject unknown flags when using NULL path
        Documentation/filesystems: fix documentation for ->getattr()
      2a610b8a
  7. 08 Apr, 2017 16 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 78d91a75
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Here's a pull request for 4.11-rc, fixing a set of issues mostly
        centered around the new scheduling framework. These have been brewing
        for a while, but split up into what we absolutely need in 4.11, and
        what we can defer until 4.12. These are well tested, on both single
        queue and multiqueue setups, and with and without shared tags. They
        fix several hangs that have happened in testing.
      
        This is obviously larger than I would have preferred at this point in
        time, but I don't think we can shave much off this and still get the
        desired results.
      
        In detail, this pull request contains:
      
         - a set of five fixes for NVMe, mostly from Christoph and one from
           Roland.
      
         - a series from Bart, fixing issues with dm-mq and SCSI shared tags
           and scheduling. Note that one of those patches commit messages may
           read like an optimization, but it is in fact an important fix for
           queue restarts in particular.
      
         - a series from Omar, most importantly fixing a hang with multiple
           hardware queues when we fail to get a driver tag. Another important
           fix in there is for resizing hardware queues, which nbd does when
           handling multiple sockets for one connection.
      
         - fixing an imbalance in putting the ctx for hctx request allocations
           from Minchan"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: Restart a single queue if tag sets are shared
        dm rq: Avoid that request processing stalls sporadically
        scsi: Avoid that SCSI queues get stuck
        blk-mq: Introduce blk_mq_delay_run_hw_queue()
        blk-mq: remap queues when adding/removing hardware queues
        blk-mq-sched: fix crash in switch error path
        blk-mq-sched: set up scheduler tags when bringing up new queues
        blk-mq-sched: refactor scheduler initialization
        blk-mq: use the right hctx when getting a driver tag fails
        nvmet: fix byte swap in nvmet_parse_io_cmd
        nvmet: fix byte swap in nvmet_execute_write_zeroes
        nvmet: add missing byte swap in nvmet_get_smart_log
        nvme: add missing byte swap in nvme_setup_discard
        nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
        block: do not put mq context in blk_mq_alloc_request_hctx
      78d91a75
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · c3df1c7c
      Linus Torvalds authored
      Pull pin control fix from Linus Walleij:
       "This late fix for pin control is hopefully the last I send this cycle.
      
        The problem was detected early in the v4.11 release cycle and there
        has been some back and forth on how to solve it. Sadly the proper fix
        arrives late, but at least not too late.
      
        An issue was detected with pin control on the Freescale i.MX after the
        refactorings for more general group and function handling.
      
        We now have the proper fix for this"
      
      * tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
      c3df1c7c
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 894ca30c
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Some more powerpc fixes for 4.11:
      
        Headed to stable:
      
         - disable HFSCR[TM] if TM is not supported, fixes a potential host
           kernel crash triggered by a hostile guest, but only in
           configurations that no one uses
      
         - don't try to fix up misaligned load-with-reservation instructions
      
         - fix flush_(d|i)cache_range() called from modules on little endian
           kernels
      
         - add missing global TLB invalidate if cxl is active
      
         - fix missing preempt_disable() in crc32c-vpmsum
      
        And a fix for selftests build changes that went in this release:
      
         - selftests/powerpc: Fix standalone powerpc build
      
        Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
        Paul Mackerras"
      
      * tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
        powerpc/mm: Add missing global TLB invalidate if cxl is active
        powerpc/64: Fix flush_(d|i)cache_range() called from modules
        powerpc: Don't try to fix up misaligned load-with-reservation instructions
        powerpc: Disable HFSCR[TM] if TM is not supported
        selftests/powerpc: Fix standalone powerpc build
      894ca30c
    • Chris Salls's avatar
      mm/mempolicy.c: fix error handling in set_mempolicy and mbind. · cf01fb99
      Chris Salls authored
      In the case that compat_get_bitmap fails we do not want to copy the
      bitmap to the user as it will contain uninitialized stack data and leak
      sensitive data.
      Signed-off-by: default avatarChris Salls <salls@cs.ucsb.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf01fb99
    • Liping Zhang's avatar
      sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec · 425fffd8
      Liping Zhang authored
      Currently, inputting the following command will succeed but actually the
      value will be truncated:
      
        # echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat
      
      This is not friendly to the user, so instead, we should report error
      when the value is larger than UINT_MAX.
      
      Fixes: e7d316a0 ("sysctl: handle error writing UINT_MAX to u32 fields")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      425fffd8
    • Tejun Heo's avatar
      MAINTAINERS: separate out kernfs maintainership · 27f395b8
      Tejun Heo authored
      Separate out kernfs from driver core and add myself as a
      co-maintainer.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27f395b8
    • NeilBrown's avatar
      sysfs: be careful of error returns from ops->show() · c8a139d0
      NeilBrown authored
      ops->show() can return a negative error code.
      Commit 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
      (in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
      would look like large numbers.
      As a result, if an error is returned, sysfs_kf_read() will return the
      value of 'count', typically 4096.
      
      Commit 17d0774f ("sysfs: correctly handle read offset on PREALLOC attrs")
      (in v4.8) extended this error to use the unsigned large 'len' as a size for
      memmove().
      Consequently, if ->show returns an error, then the first read() on the
      sysfs file will return 4096 and could return uninitialized memory to
      user-space.
      If the application performs a subsequent read, this will trigger a memmove()
      with extremely large count, and is likely to crash the machine is bizarre ways.
      
      This bug can currently only be triggered by reading from an md
      sysfs attribute declared with __ATTR_PREALLOC() during the
      brief period between when mddev_put() deletes an mddev from
      the ->all_mddevs list, and when mddev_delayed_delete() - which is
      scheduled on a workqueue - completes.
      Before this, an error won't be returned by the ->show()
      After this, the ->show() won't be called.
      
      I can reproduce it reliably only by putting delay like
      	usleep_range(500000,700000);
      early in mddev_delayed_delete(). Then after creating an
      md device md0 run
        echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
      
      The bug can be triggered without the usleep.
      
      Fixes: 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
      Fixes: 17d0774f ("sysfs: correctly handle read offset on PREALLOC attrs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a139d0
    • Johan Hovold's avatar
      Documentation: stable-kernel-rules: fix stable-tag format · cf903e9d
      Johan Hovold authored
      A patch documenting how to specify which kernels a particular fix should
      be backported to (seemingly) inadvertently added a minus sign after the
      kernel version. This particular stable-tag format had never been used
      prior to this patch, and was neither present when the patch in question
      was first submitted (it was added in v2 without any comment).
      
      Drop the minus sign to avoid any confusion.
      
      Fixes: fdc81b79 ("stable_kernel_rules: Add clause about specification of kernel versions to patch.")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf903e9d
    • Shuxiao Zhang's avatar
      staging: android: ashmem: lseek failed due to no FMODE_LSEEK. · 97fbfef6
      Shuxiao Zhang authored
      vfs_llseek will check whether the file mode has
      FMODE_LSEEK, no return failure. But ashmem can be
      lseek, so add FMODE_LSEEK to ashmem file.
      
      Comment From Greg Hackmann:
      	ashmem_llseek() passes the llseek() call through to the backing
      	shmem file.  91360b02 ("ashmem: use vfs_llseek()") changed
      	this from directly calling the file's llseek() op into a VFS
      	layer call.  This also adds a check for the FMODE_LSEEK bit, so
      	without that bit ashmem_llseek() now always fails with -ESPIPE.
      
      Fixes: 91360b02 ("ashmem: use vfs_llseek()")
      Signed-off-by: default avatarShuxiao Zhang <zhangshuxiao@xiaomi.com>
      Tested-by: default avatarGreg Hackmann <ghackmann@google.com>
      Cc: stable <stable@vger.kernel.org> # 3.18+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97fbfef6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 8b65bb57
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Several fixes here, mostly having to due with either build errors or
        memory corruptions depending upon whether you have THP enabled or not"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: remove unused wp_works_ok macro
        sparc32: Export vac_cache_size to fix build error
        sparc64: Fix memory corruption when THP is enabled
        sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
        arch/sparc: Avoid DCTI Couples
        sparc64: kern_addr_valid regression
        sparc64: Add support for 2G hugepages
        sparc64: Fix size check in huge_pte_alloc
      8b65bb57
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 542380a2
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM:
         - Fix a problem with GICv3 userspace save/restore
         - Clarify GICv2 userspace save/restore ABI
         - Be more careful in clearing GIC LRs
         - Add missing synchronization primitive to our MMU handling code
      
        PPC:
         - Check for a NULL return from kzalloc
      
        s390:
         - Prevent translation exception errors on valid page tables for the
           instruction-exection-protection support
      
        x86:
         - Fix Page-Modification Logging when running a nested guest"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
        KVM: nVMX: initialize PML fields in vmcs02
        KVM: nVMX: do not leak PML full vmexit to L1
        KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
        KVM: arm64: Ensure LRs are clear when they should be
        kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
        KVM: s390: remove change-recording override support
        arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
        arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
      542380a2
    • Linus Torvalds's avatar
      Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit · 62fedca5
      Linus Torvalds authored
      Pull audit cleanup from Paul Moore:
       "A week later than I had hoped, but as promised, here is the audit
        uninline-fix we talked about during the last audit pull request.
      
        The patch is slightly different than what we originally discussed as
        it made more sense to keep the audit_signal_info() function in
        auditsc.c rather than move it and bunch of other related
        variables/definitions into audit.c/audit.h.
      
        At some point in the future I need to look at how the audit code is
        organized across kernel/audit*, I suspect we could do things a bit
        better, but it doesn't seem like a -rc release is a good place for
        that ;)
      
        Regardless, this patch passes our tests without problem and looks good
        for v4.11"
      
      * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
        audit: move audit_signal_info() into kernel/auditsc.c
      62fedca5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 56c29979
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: move pcp and lru-pcp draining into single wq
        mailmap: update Yakir Yang email address
        mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
        dax: fix radix tree insertion race
        mm, thp: fix setting of defer+madvise thp defrag mode
        ptrace: fix PTRACE_LISTEN race corrupting task->state
        vmlinux.lds: add missing VMLINUX_SYMBOL macros
        mm/page_alloc.c: fix print order in show_free_areas()
        userfaultfd: report actual registered features in fdinfo
        mm: fix page_vma_mapped_walk() for ksm pages
      56c29979
    • Michal Hocko's avatar
      mm: move pcp and lru-pcp draining into single wq · ce612879
      Michal Hocko authored
      We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
      vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
      per cpu lru caches.  This seems more than necessary because both can run
      on a single WQ.  Both do not block on locks requiring a memory
      allocation nor perform any allocations themselves.  We will save one
      rescuer thread this way.
      
      On the other hand drain_all_pages() queues work on the system wq which
      doesn't have rescuer and so this depend on memory allocation (when all
      workers are stuck allocating and new ones cannot be created).
      
      Initially we thought this would be more of a theoretical problem but
      Hugh Dickins has reported:
      
      : 4.11-rc has been giving me hangs after hours of swapping load.  At
      : first they looked like memory leaks ("fork: Cannot allocate memory");
      : but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
      : before looking at /proc/meminfo one time, and the stat_refresh stuck
      : in D state, waiting for completion of flush_work like many kworkers.
      : kthreadd waiting for completion of flush_work in drain_all_pages().
      
      This worker should be using WQ_RECLAIM as well in order to guarantee a
      forward progress.  We can reuse the same one as for lru draining and
      vmstat.
      
      Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarYang Li <pku.leo@gmail.com>
      Tested-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce612879
    • Jeffy Chen's avatar
      mailmap: update Yakir Yang email address · cdcf4330
      Jeffy Chen authored
      Set current email address to replace previous employers email addresses.
      
      Link: http://lkml.kernel.org/r/1491450722-6633-1-git-send-email-jeffy.chen@rock-chips.comSigned-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdcf4330
    • David Rientjes's avatar
      mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() · 460bcec8
      David Rientjes authored
      We got need_resched() warnings in swap_cgroup_swapoff() because
      swap_cgroup_ctrl[type].length is particularly large.
      
      Reschedule when needed.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      460bcec8