1. 24 May, 2018 1 commit
    • Al Viro's avatar
      fix io_destroy()/aio_complete() race · 4faa9996
      Al Viro authored
      If io_destroy() gets to cancelling everything that can be cancelled and
      gets to kiocb_cancel() calling the function driver has left in ->ki_cancel,
      it becomes vulnerable to a race with IO completion.  At that point req
      is already taken off the list and aio_complete() does *NOT* spin until
      we (in free_ioctx_users()) releases ->ctx_lock.  As the result, it proceeds
      to kiocb_free(), freing req just it gets passed to ->ki_cancel().
      
      Fix is simple - remove from the list after the call of kiocb_cancel().  All
      instances of ->ki_cancel() already have to cope with the being called with
      iocb still on list - that's what happens in io_cancel(2).
      
      Cc: stable@kernel.org
      Fixes: 0460fef2 "aio: use cancellation list lazily"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4faa9996
  2. 21 May, 2018 10 commits
    • Al Viro's avatar
      aio: fix io_destroy(2) vs. lookup_ioctx() race · baf10564
      Al Viro authored
      kill_ioctx() used to have an explicit RCU delay between removing the
      reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
      At some point that delay had been removed, on the theory that
      percpu_ref_kill() itself contained an RCU delay.  Unfortunately, that was
      the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
      by lookup_ioctx().  As the result, we could get ctx freed right under
      lookup_ioctx().  Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit
      RCU grace period when freeing kioctx"); however, that fix is not enough.
      
      Suppose io_destroy() from one thread races with e.g. io_setup() from another;
      CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
      has picked it (under rcu_read_lock()).  Then CPU1 proceeds to drop the
      refcount, getting it to 0 and triggering a call of free_ioctx_users(),
      which proceeds to drop the secondary refcount and once that reaches zero
      calls free_ioctx_reqs().  That does
              INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
              queue_rcu_work(system_wq, &ctx->free_rwork);
      and schedules freeing the whole thing after RCU delay.
      
      In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
      refcount from 0 to 1 and returned the reference to io_setup().
      
      Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
      freed until after percpu_ref_get().  Sure, we'd increment the counter before
      ctx can be freed.  Now we are out of rcu_read_lock() and there's nothing to
      stop freeing of the whole thing.  Unfortunately, CPU2 assumes that since it
      has grabbed the reference, ctx is *NOT* going away until it gets around to
      dropping that reference.
      
      The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
      It's not costlier than what we currently do in normal case, it's safe to
      call since freeing *is* delayed and it closes the race window - either
      lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
      won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
      fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
      the object in question at all.
      
      Cc: stable@kernel.org
      Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      baf10564
    • Al Viro's avatar
      ext2: fix a block leak · 5aa1437d
      Al Viro authored
      open file, unlink it, then use ioctl(2) to make it immutable or
      append only.  Now close it and watch the blocks *not* freed...
      
      Immutable/append-only checks belong in ->setattr().
      Note: the bug is old and backport to anything prior to 737f2e93
      ("ext2: convert to use the new truncate convention") will need
      these checks lifted into ext2_setattr().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5aa1437d
    • Al Viro's avatar
      nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed · 3819bb0d
      Al Viro authored
      That can (and does, on some filesystems) happen - ->mkdir() (and thus
      vfs_mkdir()) can legitimately leave its argument negative and just
      unhash it, counting upon the lookup to pick the object we'd created
      next time we try to look at that name.
      
      Some vfs_mkdir() callers forget about that possibility...
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3819bb0d
    • Al Viro's avatar
      cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed · 9c3e9025
      Al Viro authored
      That can (and does, on some filesystems) happen - ->mkdir() (and thus
      vfs_mkdir()) can legitimately leave its argument negative and just
      unhash it, counting upon the lookup to pick the object we'd created
      next time we try to look at that name.
      
      Some vfs_mkdir() callers forget about that possibility...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9c3e9025
    • Al Viro's avatar
      unfuck sysfs_mount() · 7b745a4e
      Al Viro authored
      new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
      and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
      is not a solution - IS_ERR(root) is true in some cases when new_sb is true.
      
      Make sure new_sb is initialized (and matches the reality) in all cases and
      fix the condition for dropping kobj reference - we want it done precisely
      in those situations where the reference has not been transferred into a new
      super_block instance.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7b745a4e
    • Al Viro's avatar
      kernfs: deal with kernfs_fill_super() failures · 82382ace
      Al Viro authored
      make sure that info->node is initialized early, so that kernfs_kill_sb()
      can list_del() it safely.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      82382ace
    • Joe Perches's avatar
      cramfs: Fix IS_ENABLED typo · 08a8f308
      Joe Perches authored
      There's an extra C here...
      
      Fixes: 99c18ce5 ("cramfs: direct memory access support")
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      08a8f308
    • Al Viro's avatar
      befs_lookup(): use d_splice_alias() · f4e4d434
      Al Viro authored
      RTFS(Documentation/filesystems/nfs/Exporting) if you try to make
      something exportable.
      
      Fixes: ac632f5b "befs: add NFS export support"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f4e4d434
    • Al Viro's avatar
      affs_lookup: switch to d_splice_alias() · 87fbd639
      Al Viro authored
      Making something exportable takes more than providing ->s_export_ops.
      In particular, ->lookup() *MUST* use d_splice_alias() instead of
      d_add().
      
      Reading Documentation/filesystems/nfs/Exporting would've been a good idea;
      as it is, exporting AFFS is badly (and exploitably) broken.
      
      Partially-Fixes: ed4433d7 "fs/affs: make affs exportable"
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      87fbd639
    • Al Viro's avatar
      affs_lookup(): close a race with affs_remove_link() · 30da870c
      Al Viro authored
      we unlock the directory hash too early - if we are looking at secondary
      link and primary (in another directory) gets removed just as we unlock,
      we could have the old primary moved in place of the secondary, leaving
      us to look into freed entry (and leaving our dentry with ->d_fsdata
      pointing to a freed entry).
      
      Cc: stable@vger.kernel.org # 2.4.4+
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      30da870c
  3. 13 May, 2018 1 commit
    • Al Viro's avatar
      fix breakage caused by d_find_alias() semantics change · b127125d
      Al Viro authored
      "VFS: don't keep disconnected dentries on d_anon" had a non-trivial
      side-effect - d_unhashed() now returns true for those dentries,
      making d_find_alias() skip them altogether.  For most of its callers
      that's fine - we really want a connected alias there.  However,
      there is a codepath where we relied upon picking such aliases
      if nothing else could be found - selinux delayed initialization
      of contexts for inodes on already mounted filesystems used to
      rely upon that.
      
      Cc: stable@kernel.org # f1ee6162 "VFS: don't keep disconnected dentries on d_anon"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b127125d
  4. 11 May, 2018 2 commits
    • Dave Chinner's avatar
      fs: don't scan the inode cache before SB_BORN is set · 79f546a6
      Dave Chinner authored
      We recently had an oops reported on a 4.14 kernel in
      xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
      and so the m_perag_tree lookup walked into lala land.  It produces
      an oops down this path during the failed mount:
      
        radix_tree_gang_lookup_tag+0xc4/0x130
        xfs_perag_get_tag+0x37/0xf0
        xfs_reclaim_inodes_count+0x32/0x40
        xfs_fs_nr_cached_objects+0x11/0x20
        super_cache_count+0x35/0xc0
        shrink_slab.part.66+0xb1/0x370
        shrink_node+0x7e/0x1a0
        try_to_free_pages+0x199/0x470
        __alloc_pages_slowpath+0x3a1/0xd20
        __alloc_pages_nodemask+0x1c3/0x200
        cache_grow_begin+0x20b/0x2e0
        fallback_alloc+0x160/0x200
        kmem_cache_alloc+0x111/0x4e0
      
      The problem is that the superblock shrinker is running before the
      filesystem structures it depends on have been fully set up. i.e.
      the shrinker is registered in sget(), before ->fill_super() has been
      called, and the shrinker can call into the filesystem before
      fill_super() does it's setup work. Essentially we are exposed to
      both use-after-free and use-before-initialisation bugs here.
      
      To fix this, add a check for the SB_BORN flag in super_cache_count.
      In general, this flag is not set until ->fs_mount() completes
      successfully, so we know that it is set after the filesystem
      setup has completed. This matches the trylock_super() behaviour
      which will not let super_cache_scan() run if SB_BORN is not set, and
      hence will not allow the superblock shrinker from entering the
      filesystem while it is being set up or after it has failed setup
      and is being torn down.
      
      Cc: stable@kernel.org
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      79f546a6
    • Al Viro's avatar
      do d_instantiate/unlock_new_inode combinations safely · 1e2e547a
      Al Viro authored
      For anything NFS-exported we do _not_ want to unlock new inode
      before it has grown an alias; original set of fixes got the
      ordering right, but missed the nasty complication in case of
      lockdep being enabled - unlock_new_inode() does
      	lockdep_annotate_inode_mutex_key(inode)
      which can only be done before anyone gets a chance to touch
      ->i_mutex.  Unfortunately, flipping the order and doing
      unlock_new_inode() before d_instantiate() opens a window when
      mkdir can race with open-by-fhandle on a guessed fhandle, leading
      to multiple aliases for a directory inode and all the breakage
      that follows from that.
      
      	Correct solution: a new primitive (d_instantiate_new())
      combining these two in the right order - lockdep annotate, then
      d_instantiate(), then the rest of unlock_new_inode().  All
      combinations of d_instantiate() with unlock_new_inode() should
      be converted to that.
      
      Cc: stable@kernel.org	# 2.6.29 and later
      Tested-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1e2e547a
  5. 02 May, 2018 2 commits
  6. 20 Apr, 2018 1 commit
  7. 16 Apr, 2018 10 commits
    • Tetsuo Handa's avatar
      mm,vmscan: Allow preallocating memory for register_shrinker(). · 8e04944f
      Tetsuo Handa authored
      syzbot is catching so many bugs triggered by commit 9ee332d9
      ("sget(): handle failures of register_shrinker()"). That commit expected
      that calling kill_sb() from deactivate_locked_super() without successful
      fill_super() is safe, but the reality was different; some callers assign
      attributes which are needed for kill_sb() after sget() succeeds.
      
      For example, [1] is a report where sb->s_mode (which seems to be either
      FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
      assigned unless sget() succeeds. But it does not worth complicate sget()
      so that register_shrinker() failure path can safely call
      kill_block_super() via kill_sb(). Making alloc_super() fail if memory
      allocation for register_shrinker() failed is much simpler. Let's avoid
      calling deactivate_locked_super() from sget_userns() by preallocating
      memory for the shrinker and making register_shrinker() in sget_userns()
      never fail.
      
      [1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8e04944f
    • Al Viro's avatar
      rpc_pipefs: fix double-dput() · 4a3877c4
      Al Viro authored
      if we ever hit rpc_gssd_dummy_depopulate() dentry passed to
      it has refcount equal to 1.  __rpc_rmpipe() drops it and
      dput() done after that hits an already freed dentry.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4a3877c4
    • Al Viro's avatar
      orangefs_kill_sb(): deal with allocation failures · 65903842
      Al Viro authored
      orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
      oops in that case.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      65903842
    • Al Viro's avatar
      jffs2_kill_sb(): deal with failed allocations · c66b23c2
      Al Viro authored
      jffs2_fill_super() might fail to allocate jffs2_sb_info;
      jffs2_kill_sb() must survive that.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c66b23c2
    • Al Viro's avatar
      hypfs_kill_super(): deal with failed allocations · a24cd490
      Al Viro authored
      hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
      should not oops on that.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a24cd490
    • Linus Torvalds's avatar
      Linux 4.17-rc1 · 60cc43fc
      Linus Torvalds authored
      60cc43fc
    • Linus Torvalds's avatar
      Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · e37563bb
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "We have queued a few more fixes (error handling, log replay,
        softlockup) and the rest is SPDX updates that touche almost all files
        so the diffstat is long"
      
      * tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: Only check first key for committed tree blocks
        btrfs: add SPDX header to Kconfig
        btrfs: replace GPL boilerplate by SPDX -- sources
        btrfs: replace GPL boilerplate by SPDX -- headers
        Btrfs: fix loss of prealloc extents past i_size after fsync log replay
        Btrfs: clean up resources during umount after trans is aborted
        btrfs: Fix possible softlock on single core machines
        Btrfs: bail out on error during replay_dir_deletes
        Btrfs: fix NULL pointer dereference in log_dir_items
      e37563bb
    • Linus Torvalds's avatar
      Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6 · 09c9b0ea
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "SMB3 fixes, a few for stable, and some important cleanup work from
        Ronnie of the smb3 transport code"
      
      * tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: change validate_buf to validate_iov
        cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
        cifs: Change SMB2_open to return an iov for the error parameter
        cifs: add resp_buf_size to the mid_q_entry structure
        smb3.11: replace a 4 with server->vals->header_preamble_size
        cifs: replace a 4 with server->vals->header_preamble_size
        cifs: add pdu_size to the TCP_Server_Info structure
        SMB311: Improve checking of negotiate security contexts
        SMB3: Fix length checking of SMB3.11 negotiate request
        CIFS: add ONCE flag for cifs_dbg type
        cifs: Use ULL suffix for 64-bit constant
        SMB3: Log at least once if tree connect fails during reconnect
        cifs: smb2pdu: Fix potential NULL pointer dereference
      09c9b0ea
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f0d98d85
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a set of minor (and safe changes) that didn't make the initial
        pull request plus some bug fixes.
      
        The status handling code is actually a running regression from the
        previous merge window which had an incomplete fix (now reverted) and
        most of the remaining bug fixes are for problems older than the
        current merge window"
      
      [ Side note: this merge also takes the base kernel git repository to 6+
        million objects for the first time. Technically we hit it a couple of
        merges ago already if you count all the tag objects, but now it
        reaches 6M+ objects reachable from HEAD.
      
        I was joking around that that's when I should switch to 5.0, because
        3.0 happened at the 2M mark, and 4.0 happened at 4M objects. But
        probably not, even if numerology is about as good a reason as any.
      
                                                                    - Linus ]
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: devinfo: Add Microsoft iSCSI target to 1024 sector blacklist
        scsi: cxgb4i: silence overflow warning in t4_uld_rx_handler()
        scsi: dpt_i2o: Use after free in I2ORESETCMD ioctl
        scsi: core: Make scsi_result_to_blk_status() recognize CONDITION MET
        scsi: core: Rename __scsi_error_from_host_byte() into scsi_result_to_blk_status()
        Revert "scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"
        scsi: aacraid: Insure command thread is not recursively stopped
        scsi: qla2xxx: Correct setting of SAM_STAT_CHECK_CONDITION
        scsi: qla2xxx: correctly shift host byte
        scsi: qla2xxx: Fix race condition between iocb timeout and initialisation
        scsi: qla2xxx: Avoid double completion of abort command
        scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
        scsi: scsi_dh: Don't look for NULL devices handlers by name
        scsi: core: remove redundant assignment to shost->use_blk_mq
      f0d98d85
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · ca71b3ba
      Linus Torvalds authored
      Pull more Kbuild updates from Masahiro Yamada:
      
       - pass HOSTLDFLAGS when compiling single .c host programs
      
       - build genksyms lexer and parser files instead of using shipped
         versions
      
       - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
      
       - let the top .gitignore globally ignore artifacts generated by flex,
         bison, and asn1_compiler
      
       - let the top Makefile globally clean artifacts generated by flex,
         bison, and asn1_compiler
      
       - use safer .SECONDARY marker instead of .PRECIOUS to prevent
         intermediate files from being removed
      
       - support -fmacro-prefix-map option to make __FILE__ a relative path
      
       - fix # escaping to prepare for the future GNU Make release
      
       - clean up deb-pkg by using debian tools instead of handrolled
         source/changes generation
      
       - improve rpm-pkg portability by supporting kernel-install as a
         fallback of new-kernel-pkg
      
       - extend Kconfig listnewconfig target to provide more information
      
      * tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: extend output of 'listnewconfig'
        kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg
        Kbuild: fix # escaping in .cmd files for future Make
        kbuild: deb-pkg: split generating packaging and build
        kbuild: use -fmacro-prefix-map to make __FILE__ a relative path
        kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
        kbuild: rename *-asn1.[ch] to *.asn1.[ch]
        kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
        .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
        kbuild: add %.dtb.S and %.dtb to 'targets' automatically
        kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
        genksyms: generate lexer and parser during build instead of shipping
        kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
        .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
        kbuild: use HOSTLDFLAGS for single .c executables
      ca71b3ba
  8. 15 Apr, 2018 8 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9fb71c2f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of fixes and updates for x86:
      
         - Address a swiotlb regression which was caused by the recent DMA
           rework and made driver fail because dma_direct_supported() returned
           false
      
         - Fix a signedness bug in the APIC ID validation which caused invalid
           APIC IDs to be detected as valid thereby bloating the CPU possible
           space.
      
         - Fix inconsisten config dependcy/select magic for the MFD_CS5535
           driver.
      
         - Fix a corruption of the physical address space bits when encryption
           has reduced the address space and late cpuinfo updates overwrite
           the reduced bit information with the original value.
      
         - Dominiks syscall rework which consolidates the architecture
           specific syscall functions so all syscalls can be wrapped with the
           same macros. This allows to switch x86/64 to struct pt_regs based
           syscalls. Extend the clearing of user space controlled registers in
           the entry patch to the lower registers"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Fix signedness bug in APIC ID validity checks
        x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
        x86/olpc: Fix inconsistent MFD_CS5535 configuration
        swiotlb: Use dma_direct_supported() for swiotlb_ops
        syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
        syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
        syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
        syscalls/core, syscalls/x86: Clean up syscall stub naming convention
        syscalls/x86: Extend register clearing on syscall entry to lower registers
        syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
        syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
        syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
        syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
        syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
        x86/syscalls: Don't pointlessly reload the system call number
        x86/mm: Fix documentation of module mapping range with 4-level paging
        x86/cpuid: Switch to 'static const' specifier
      9fb71c2f
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6b0a02e8
      Linus Torvalds authored
      Pull x86 pti updates from Thomas Gleixner:
       "Another series of PTI related changes:
      
         - Remove the manual stack switch for user entries from the idtentry
           code. This debloats entry by 5k+ bytes of text.
      
         - Use the proper types for the asm/bootparam.h defines to prevent
           user space compile errors.
      
         - Use PAGE_GLOBAL for !PCID systems to gain back performance
      
         - Prevent setting of huge PUD/PMD entries when the entries are not
           leaf entries otherwise the entries to which the PUD/PMD points to
           and are populated get lost"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
        x86/pti: Leave kernel text global for !PCID
        x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
        x86/pti: Enable global pages for shared areas
        x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
        x86/mm: Comment _PAGE_GLOBAL mystery
        x86/mm: Remove extra filtering in pageattr code
        x86/mm: Do not auto-massage page protections
        x86/espfix: Document use of _PAGE_GLOBAL
        x86/mm: Introduce "default" kernel PTE mask
        x86/mm: Undo double _PAGE_PSE clearing
        x86/mm: Factor out pageattr _PAGE_GLOBAL setting
        x86/entry/64: Drop idtentry's manual stack switch for user entries
        x86/uapi: Fix asm/bootparam.h userspace compilation errors
      6b0a02e8
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 71b8ebbf
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A few scheduler fixes:
      
         - Prevent a bogus warning vs. runqueue clock update flags in
           do_sched_rt_period_timer()
      
         - Simplify the helper functions which handle requests for skipping
           the runqueue clock updat.
      
         - Do not unlock the tunables mutex in the error path of the cpu
           frequency scheduler utils. Its not held.
      
         - Enforce proper alignement for 'struct util_est' in sched_avg to
           prevent a misalignment fault on IA64"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Force proper alignment of 'struct util_est'
        sched/core: Simplify helpers for rq clock update skip requests
        sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
        sched/cpufreq/schedutil: Fix error path mutex unlock
      71b8ebbf
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 174e7194
      Linus Torvalds authored
      Pull more perf updates from Thomas Gleixner:
       "A rather large set of perf updates:
      
        Kernel:
      
         - Fix various initialization issues
      
         - Prevent creating [ku]probes for not CAP_SYS_ADMIN users
      
        Tooling:
      
         - Show only failing syscalls with 'perf trace --failure' (Arnaldo
           Carvalho de Melo)
      
                  e.g: See what 'openat' syscalls are failing:
      
              # perf trace --failure -e openat
               762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory
               <SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? >
               790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory
              ^C#
      
         - Show information about the event (freq, nr_samples, total
           period/nr_events) in the annotate --tui and --stdio2 'perf
           annotate' output, similar to the first line in the 'perf report
           --tui', but just for the samples for a the annotated symbol
           (Arnaldo Carvalho de Melo)
      
         - Introduce 'perf version --build-options' to show what features were
           linked, aliased as well as a shorter 'perf -vv' (Jin Yao)
      
         - Add a "dso_size" sort order (Kim Phillips)
      
         - Remove redundant ')' in the tracepoint output in 'perf trace'
           (Changbin Du)
      
         - Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo
           Carvalho de Melo)
      
         - Show group details on the title line in the annotate browser and
           'perf annotate --stdio2' output, so that the per-event columns can
           have headers (Arnaldo Carvalho de Melo)
      
         - Fixup vertical line separating metrics from instructions and
           cleaning unused lines at the bottom, both in the annotate TUI
           browser (Arnaldo Carvalho de Melo)
      
         - Remove duplicated 'samples' in lost samples warning in
           'perf report' (Arnaldo Carvalho de Melo)
      
         - Synchronize i915_drm.h, silencing the perf build process,
           automagically adding support for the new DRM_I915_QUERY ioctl
           (Arnaldo Carvalho de Melo)
      
         - Make auxtrace_queues__add_buffer() allocate struct buffer, from a
           patchkit already applied (Adrian Hunter)
      
         - Fix the --stdio2/TUI annotate output to include group details, be
           it for a recorded '{a,b,f}' explicit event group or when forcing
           group display using 'perf report --group' for a set of events not
           recorded as a group (Arnaldo Carvalho de Melo)
      
         - Fix display artifacts in the ui browser (base class for the
           annotate and main report/top TUI browser) related to the extra
           title lines work (Arnaldo Carvalho de Melo)
      
         - perf auxtrace refactorings, leftovers from a previously partially
           processed patchset (Adrian Hunter)
      
         - Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de
           Melo)
      
         - Synchronize i915_drm.h, silencing a perf build warning and in the
           process automagically adding support for a new ioctl command
           (Arnaldo Carvalho de Melo)
      
         - Fix a strncpy issue in uprobe tracing"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
        perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()
        tracing/uprobe_event: Fix strncpy corner case
        perf/core: Fix perf_uprobe_init()
        perf/core: Fix perf_kprobe_init()
        perf/core: Fix use-after-free in uprobe_perf_close()
        perf tests clang: Fix function name for clang IR test
        perf clang: Add support for recent clang versions
        perf tools: Fix perf builds with clang support
        perf tools: No need to include namespaces.h in util.h
        perf hists browser: Remove leftover from row returned from refresh
        perf hists browser: Show extra_title_lines in the 'D' debug hotkey
        perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering
        tools headers uapi: Synchronize i915_drm.h
        perf report: Remove duplicated 'samples' in lost samples warning
        perf ui browser: Fixup cleaning unused lines at the bottom
        perf annotate browser: Fixup vertical line separating metrics from instructions
        perf annotate: Show group details on the title line
        perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer
        perf/x86/intel: Move regs->flags EXACT bit init
        perf trace: Remove redundant ')'
        ...
      174e7194
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 19ca90de
      Linus Torvalds authored
      Pull x86 EFI bootup fixlet from Thomas Gleixner:
       "A single fix for an early boot warning caused by invoking
        this_cpu_has() before SMP initialization"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
      19ca90de
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 68d54d3f
      Linus Torvalds authored
      Pull irq affinity fixes from Thomas Gleixner:
      
        - Fix error path handling in the affinity spreading code
      
        - Make affinity spreading smarter to avoid issues on systems which
          claim to have hotpluggable CPUs while in fact they can't hotplug
          anything.
      
          So instead of trying to spread the vectors (and thereby the
          associated device queues) to all possibe CPUs, spread them on all
          present CPUs first. If there are left over vectors after that first
          step they are spread among the possible, but not present CPUs which
          keeps the code backwards compatible for virtual decives and NVME
          which allocate a queue per possible CPU, but makes the spreading
          smarter for devices which have less queues than possible or present
          CPUs.
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/affinity: Spread irq vectors among present CPUs as far as possible
        genirq/affinity: Allow irq spreading from a given starting point
        genirq/affinity: Move actual irq vector spreading into a helper function
        genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask
        genirq/affinity: Don't return with empty affinity masks on error
      68d54d3f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://github.com/openrisc/linux · 9dceab89
      Linus Torvalds authored
      Pull OpenRISC fixlet from Stafford Horne:
       "Just one small thing here, it came in a while back but I didnt have
        anything in my 4.16 queue, still its the only thing for 4.17 so
        sending it alone.
      
        Small cleanup: remove unused __ARCH_HAVE_MMU define"
      
      * tag 'for-linus' of git://github.com/openrisc/linux:
        openrisc: remove unused __ARCH_HAVE_MMU define
      9dceab89
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b1cb4f93
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix crashes when loading modules built with a different
         CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.
      
       - Fix busy loops in the OPAL NVRAM driver if we get certain error
         conditions from firmware.
      
       - Remove tlbie trace points from KVM code that's called in real mode,
         because it causes crashes.
      
       - Fix checkstops caused by invalid tlbiel on Power9 Radix.
      
       - Ensure the set of CPU features we "know" are always enabled is
         actually the minimal set when we build with support for firmware
         supplied CPU features.
      
      Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
      
      * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
        powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
        KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
        powerpc/8xx: Fix build with hugetlbfs enabled
        powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
        powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
        powerpc/fscr: Enable interrupts earlier before calling get_user()
        powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
        powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
      b1cb4f93
  9. 14 Apr, 2018 5 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 18b7fd1c
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
      
       - various hotfixes
      
       - kexec_file updates and feature work
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
        kernel/kexec_file.c: move purgatories sha256 to common code
        kernel/kexec_file.c: allow archs to set purgatory load address
        kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
        kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
        kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
        kernel/kexec_file.c: split up __kexec_load_puragory
        kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
        kernel/kexec_file.c: search symbols in read-only kexec_purgatory
        kernel/kexec_file.c: make purgatory_info->ehdr const
        kernel/kexec_file.c: remove checks in kexec_purgatory_load
        include/linux/kexec.h: silence compile warnings
        kexec_file, x86: move re-factored code to generic side
        x86: kexec_file: clean up prepare_elf64_headers()
        x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
        x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
        x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
        kexec_file,x86,powerpc: factor out kexec_file_ops functions
        kexec_file: make use of purgatory optional
        proc: revalidate misc dentries
        mm, slab: reschedule cache_reap() on the same CPU
        ...
      18b7fd1c
    • Philipp Rudo's avatar
      kernel/kexec_file.c: move purgatories sha256 to common code · df6f2801
      Philipp Rudo authored
      The code to verify the new kernels sha digest is applicable for all
      architectures.  Move it to common code.
      
      One problem is the string.c implementation on x86.  Currently sha256
      includes x86/boot/string.h which defines memcpy and memset to be gcc
      builtins.  By moving the sha256 implementation to common code and
      changing the include to linux/string.h both functions are no longer
      defined.  Thus definitions have to be provided in x86/purgatory/string.c
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.comSigned-off-by: default avatarPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df6f2801
    • Philipp Rudo's avatar
      kernel/kexec_file.c: allow archs to set purgatory load address · 3be3f61d
      Philipp Rudo authored
      For s390 new kernels are loaded to fixed addresses in memory before they
      are booted.  With the current code this is a problem as it assumes the
      kernel will be loaded to an 'arbitrary' address.  In particular,
      kexec_locate_mem_hole searches for a large enough memory region and sets
      the load address (kexec_bufer->mem) to it.
      
      Luckily there is a simple workaround for this problem.  By returning 1
      in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
      allows the architecture to set kbuf->mem by hand.  While the trick works
      fine for the kernel it does not for the purgatory as here the
      architectures don't have access to its kexec_buffer.
      
      Give architectures access to the purgatories kexec_buffer by changing
      kexec_load_purgatory to take a pointer to it.  With this change
      architectures have access to the buffer and can edit it as they need.
      
      A nice side effect of this change is that we can get rid of the
      purgatory_info->purgatory_load_address field.  As now the information
      stored there can directly be accessed from kbuf->mem.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.comSigned-off-by: default avatarPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3be3f61d
    • Philipp Rudo's avatar
      kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load · 8da0b724
      Philipp Rudo authored
      The current code uses the sh_offset field in purgatory_info->sechdrs to
      store a pointer to the current load address of the section.  Depending
      whether the section will be loaded or not this is either a pointer into
      purgatory_info->purgatory_buf or kexec_purgatory.  This is not only a
      violation of the ELF standard but also makes the code very hard to
      understand as you cannot tell if the memory you are using is read-only
      or not.
      
      Remove this misuse and store the offset of the section in
      pugaroty_info->purgatory_buf in sh_offset.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.comSigned-off-by: default avatarPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8da0b724
    • Philipp Rudo's avatar
      kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs · 620f697c
      Philipp Rudo authored
      The main loop currently uses quite a lot of variables to update the
      section headers.  Some of them are unnecessary.  So clean them up a
      little.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.comSigned-off-by: default avatarPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      620f697c