1. 19 May, 2018 9 commits
    • Pavel Tatashin's avatar
      mm: don't allow deferred pages with NEED_PER_CPU_KM · ab1e8d89
      Pavel Tatashin authored
      It is unsafe to do virtual to physical translations before mm_init() is
      called if struct page is needed in order to determine the memory section
      number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
      we initialize struct pages for all the allocated memory when deferred
      struct pages are used.
      
      My recent fix in commit c9e97a19 ("mm: initialize pages on demand
      during boot") exposed this problem, because it greatly reduced number of
      pages that are initialized before mm_init(), but the problem existed
      even before my fix, as Fengguang Wu found.
      
      Below is a more detailed explanation of the problem.
      
      We initialize struct pages in four places:
      
      1. Early in boot a small set of struct pages is initialized to fill the
         first section, and lower zones.
      
      2. During mm_init() we initialize "struct pages" for all the memory that
         is allocated, i.e reserved in memblock.
      
      3. Using on-demand logic when pages are allocated after mm_init call
         (when memblock is finished)
      
      4. After smp_init() when the rest free deferred pages are initialized.
      
      The problem occurs if we try to do va to phys translation of a memory
      between steps 1 and 2.  Because we have not yet initialized struct pages
      for all the reserved pages, it is inherently unsafe to do va to phys if
      the translation itself requires access of "struct page" as in case of
      this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
      
      The following path exposes the problem:
      
        start_kernel()
         trap_init()
          setup_cpu_entry_areas()
           setup_cpu_entry_area(cpu)
            get_cpu_gdt_paddr(cpu)
             per_cpu_ptr_to_phys(addr)
              pcpu_addr_to_page(addr)
               virt_to_page(addr)
                pfn_to_page(__pa(addr) >> PAGE_SHIFT)
      
      We disable this path by not allowing NEED_PER_CPU_KM with deferred
      struct pages feature.
      
      The problems are discussed in these threads:
        http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
        http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
        http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com
      
      Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Dennis Zhou <dennisszhou@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab1e8d89
    • Shuah Khan (Samsung OSG)'s avatar
      MAINTAINERS: add Q: entry to kselftest for patchwork project · f3d8d3cf
      Shuah Khan (Samsung OSG) authored
      A new patchwork project is created to track kselftest patches.  Update
      the kselftest entry in the MAINTAINERS file adding 'Q:' entry:
      
        https://patchwork.kernel.org/project/linux-kselftest/list/
      
      Link: http://lkml.kernel.org/r/20180515164427.12201-1-shuah@kernel.orgSigned-off-by: default avatarShuah Khan (Samsung OSG) <shuah@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3d8d3cf
    • Ross Zwisler's avatar
      radix tree: fix multi-order iteration race · 9f418224
      Ross Zwisler authored
      Fix a race in the multi-order iteration code which causes the kernel to
      hit a GP fault.  This was first seen with a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
      order 9 PMD DAX entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember for example that
      an order 2 entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      We fix this race by fixing the way that skip_siblings() detects sibling
      nodes.  Instead of testing against the preceding slot we instead look
      for siblings via is_sibling_entry() which compares against the position
      of the struct radix_tree_node.slots[] array.  This ensures that sibling
      entries are properly identified, even if they are no longer contiguous
      with the 'entry' they point to.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
      Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarCR, Sapthagirish <sapthagirish.cr@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f418224
    • Ross Zwisler's avatar
      radix tree test suite: multi-order iteration race · fd8f58c4
      Ross Zwisler authored
      Add a test which shows a race in the multi-order iteration code.  This
      test reliably hits the race in under a second on my machine, and is the
      result of a real bug report against kernel a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit
      when using order 9 PMD DAX radix tree entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember that an order 2
      entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      In the radix tree test suite this will be caught by the address
      sanitizer:
      
        ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
        0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
        READ of size 8 at 0x60c0008ae400 thread T3
            #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
            #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
            #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
            #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
            #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd8f58c4
    • Ross Zwisler's avatar
      radix tree test suite: add item_delete_rcu() · 3e252fa7
      Ross Zwisler authored
      Currently the lifetime of "struct item" entries in the radix tree are
      not controlled by RCU, but are instead deleted inline as they are
      removed from the tree.
      
      In the following patches we add a test which has threads iterating over
      items pulled from the tree and verifying them in an
      rcu_read_lock()/rcu_read_unlock() section.  This means that though an
      item has been removed from the tree it could still be being worked on by
      other threads until the RCU grace period expires.  So, we need to
      actually free the "struct item" structures at the end of the grace
      period, just as we do with "struct radix_tree_node" items.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-4-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e252fa7
    • Ross Zwisler's avatar
      radix tree test suite: fix compilation issue · dcbbf25a
      Ross Zwisler authored
      Pulled from a patch from Matthew Wilcox entitled "xarray: Add definition
      of struct xarray":
      
      > From: Matthew Wilcox <mawilcox@microsoft.com>
      > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
      
        https://patchwork.kernel.org/patch/10341249/
      
      These defines fix this compilation error:
      
        In file included from ./linux/radix-tree.h:6:0,
                         from ./linux/../../../../include/linux/idr.h:15,
                         from ./linux/idr.h:1,
                         from idr.c:4:
        ./linux/../../../../include/linux/idr.h: In function `idr_init_base':
        ./linux/../../../../include/linux/radix-tree.h:129:2: warning: implicit declaration of function `spin_lock_init'; did you mean `spinlock_t'? [-Wimplicit-function-declaration]
          spin_lock_init(&(root)->xa_lock);    \
          ^
        ./linux/../../../../include/linux/idr.h:126:2: note: in expansion of macro `INIT_RADIX_TREE'
          INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
          ^~~~~~~~~~~~~~~
      
      by providing a spin_lock_init() wrapper for the v4.17-rc* version of the
      radix tree test suite.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-3-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcbbf25a
    • Ross Zwisler's avatar
      radix tree test suite: fix mapshift build target · 8d9fa88e
      Ross Zwisler authored
      Commit c6ce3e2f ("radix tree test suite: Add config option for map
      shift") introduced a phony makefile target called 'mapshift' that ends
      up generating the file generated/map-shift.h.  This phony target was
      then added as a dependency of the top level 'targets' build target,
      which is what is run when you go to tools/testing/radix-tree and just
      type 'make'.
      
      Unfortunately, this phony target doesn't actually work as a dependency,
      so you end up getting:
      
        $ make
        make: *** No rule to make target 'generated/map-shift.h', needed by 'main.o'.  Stop.
        make: *** Waiting for unfinished jobs....
      
      Fix this by making the file generated/map-shift.h our real makefile
      target, and add this a dependency of the top level build target.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-2-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: CR, Sapthagirish <sapthagirish.cr@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d9fa88e
    • Souptick Joarder's avatar
      include/linux/mm.h: add new inline function vmf_error() · d97baf94
      Souptick Joarder authored
      Many places in drivers/ file systems, error was handled in a common way
      like below:
      
      	ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
      
      vmf_error() will replace this and return vm_fault_t type err.
      
      A lot of drivers and filesystems currently have a rather complex mapping
      of errno-to-VM_FAULT code.  We have been able to eliminate a lot of it
      by just returning VM_FAULT codes directly from functions which are
      called exclusively from the fault handling path.
      
      Some functions can be called both from the fault handler and other
      context which are expecting an errno, so they have to continue to return
      an errno.  Some users still need to choose different behaviour for
      different errnos, but vmf_error() captures the essential error
      translation that's common to all users, and those that need to handle
      additional errors can handle them first.
      
      Link: http://lkml.kernel.org/r/20180510174826.GA14268@jordon-HP-15-Notebook-PCSigned-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d97baf94
    • Matthew Wilcox's avatar
      lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly · 1e3054b9
      Matthew Wilcox authored
      I had neglected to increment the error counter when the tests failed,
      which made the tests noisy when they fail, but not actually return an
      error code.
      
      Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au
      Fixes: 3cc78125 ("lib/test_bitmap.c: add optimisation tests")
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e3054b9
  2. 18 May, 2018 9 commits
  3. 17 May, 2018 11 commits
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.17-rc6' of... · 3acf4e39
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Two k10temp fixes:
      
         - fix race condition when accessing System Management Network
           registers
      
         - fix reading critical temperatures on F15h M60h and M70h
      
        Also add PCI ID's for the AMD Raven Ridge root bridge"
      
      * tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (k10temp) Use API function to access System Management Network
        x86/amd_nb: Add support for Raven Ridge CPUs
        hwmon: (k10temp) Fix reading critical temperature register
      3acf4e39
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 58ddfe6c
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
      
       - ARM/ARM64 locking fixes
      
       - x86 fixes: PCID, UMIP, locking
      
       - improved support for recent Windows version that have a 2048 Hz APIC
         timer
      
       - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
      
       - better behaved selftests
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
        KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
        KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
        KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
        KVM: arm/arm64: Properly protect VGIC locks from IRQs
        KVM: X86: Lower the default timer frequency limit to 200us
        KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
        kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
        KVM: selftests: exit with 0 status code when tests cannot be run
        KVM: hyperv: idr_find needs RCU protection
        x86: Delay skip of emulated hypercall instruction
        KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
      58ddfe6c
    • Linus Torvalds's avatar
      Merge tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 7c9a0fc7
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "We have a core fix in the compat code for covering a potential race
        (double references), but it's a very minor change.
      
        The rest are all small device-specific quirks, as well as a correction
        of the new UAC3 support code"
      
      * tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
        ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup
        ALSA: usb: mixer: volume quirk for CM102-A+/102S+
        ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
        ALSA: control: fix a redundant-copy issue
      7c9a0fc7
    • Michael S. Tsirkin's avatar
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin authored
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      633711e8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 3e9245c5
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
      
       - a fix for the vfio ccw translation code
      
       - update an incorrect email address in the MAINTAINERS file
      
       - fix a division by zero oops in the cpum_sf code found by trinity
      
       - two fixes for the error handling of the qdio code
      
       - several spectre related patches to convert all left-over indirect
         branches in the kernel to expoline branches
      
       - update defconfigs to avoid warnings due to the netfilter Kconfig
         changes
      
       - avoid several compiler warnings in the kexec_file code for s390
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/qdio: don't release memory in qdio_setup_irq()
        s390/qdio: fix access to uninitialized qdio_q fields
        s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
        s390: use expoline thunks in the BPF JIT
        s390: extend expoline to BC instructions
        s390: remove indirect branch from do_softirq_own_stack
        s390: move spectre sysfs attribute code
        s390/kernel: use expoline for indirect branches
        s390/ftrace: use expoline for indirect branches
        s390/lib: use expoline for indirect branches
        s390/crc32-vx: use expoline for indirect branches
        s390: move expoline assembler macros to a header
        vfio: ccw: fix cleanup if cp_prefetch fails
        s390/kexec_file: add declaration of purgatory related globals
        s390: update defconfigs
        MAINTAINERS: update s390 zcrypt maintainers email address
      3e9245c5
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 305bb552
      Linus Torvalds authored
      Pull SELinux fixes from Paul Moore:
       "A small pull request to fix a few regressions in the SELinux/SCTP code
        with applications that call bind() with AF_UNSPEC/INADDR_ANY.
      
        The individual commit descriptions have more information, but the
        commits themselves should be self explanatory"
      
      * tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: correctly handle sa_family cases in selinux_sctp_bind_connect()
        selinux: fix address family in bind() and connect() to match address/port
        selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
      305bb552
    • Willy Tarreau's avatar
      proc: do not access cmdline nor environ from file-backed areas · 7f7ccc2c
      Willy Tarreau authored
      proc_pid_cmdline_read() and environ_read() directly access the target
      process' VM to retrieve the command line and environment. If this
      process remaps these areas onto a file via mmap(), the requesting
      process may experience various issues such as extra delays if the
      underlying device is slow to respond.
      
      Let's simply refuse to access file-backed areas in these functions.
      For this we add a new FOLL_ANON gup flag that is passed to all calls
      to access_remote_vm(). The code already takes care of such failures
      (including unmapped areas). Accesses via /proc/pid/mem were not
      changed though.
      
      This was assigned CVE-2018-1120.
      
      Note for stable backports: the patch may apply to kernels prior to 4.11
      but silently miss one location; it must be checked that no call to
      access_remote_vm() keeps zero as the last argument.
      Reported-by: default avatarQualys Security Advisory <qsa@qualys.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f7ccc2c
    • Coly Li's avatar
      bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n · 1c1a2ee1
      Coly Li authored
      Commit 539d39eb ("bcache: fix wrong return value in bch_debug_init()")
      returns the return value of debugfs_create_dir() to bcache_init(). When
      CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes
      bcache_init() failedi.
      
      This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n,
      so bcache can continue to work for the kernels which don't have debugfs
      enanbled.
      
      Changelog:
      v4: Add Acked-by from Kent Overstreet.
      v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS.
      v2: Remove a warning information
      v1: Initial version.
      
      Fixes: Commit 539d39eb ("bcache: fix wrong return value in bch_debug_init()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reported-by: default avatarMassimo B. <massimo.b@gmx.net>
      Reported-by: default avatarKai Krakow <kai@kaishome.de>
      Tested-by: default avatarKai Krakow <kai@kaishome.de>
      Acked-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1c1a2ee1
    • Nicholas Piggin's avatar
      powerpc/powernv: Fix NVRAM sleep in invalid context when crashing · c1d2a313
      Nicholas Piggin authored
      Similarly to opal_event_shutdown, opal_nvram_write can be called in
      the crash path with irqs disabled. Special case the delay to avoid
      sleeping in invalid context.
      
      Fixes: 3b807033 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
      Cc: stable@vger.kernel.org # v3.2
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c1d2a313
    • Dave Airlie's avatar
      Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes · bc91d181
      Dave Airlie authored
      A single fix for a recent regression.
      
      * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
        drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
      bc91d181
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 3d3aa969
      Dave Airlie authored
      - core: Fix regression in dev node offsets (Haneen)
      - vc4: Fix memory leak on driver close (Eric)
      - dumb-buffers: Prevent overflow in DIV_ROUND_UP() (Dan)
      
      Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      
      * tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc:
        drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
        drm/vc4: Fix leak of the file_priv that stored the perfmon.
        drm: Match sysfs name in link removal to link creation
      3d3aa969
  4. 16 May, 2018 7 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e6506eb2
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Some of the ftrace internal events use a zero for a data size of a
        field event. This is increasingly important for the histogram trigger
        work that is being extended.
      
        While auditing trace events, I found that a couple of the xen events
        were used as just marking that a function was called, by creating a
        static array of size zero. This can play havoc with the tracing
        features if these events are used, because a zero size of a static
        array is denoted as a special nul terminated dynamic array (this is
        what the trace_marker code uses). But since the xen events have no
        size, they are not nul terminated, and unexpected results may occur.
      
        As trace events were never intended on being a marker to denote that a
        function was hit or not, especially since function tracing and kprobes
        can trivially do the same, the best course of action is to simply
        remove these events"
      
      * tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
      e6506eb2
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc5-vsprintf' of... · 9d38cd06
      Linus Torvalds authored
      Merge tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull memory barrier for from Steven Rostedt:
       "The memory barrier usage in updating the random ptr hash for %p in
        vsprintf is incorrect.
      
        Instead of adding the read memory barrier into vsprintf() which will
        cause a slight degradation to a commonly used function in the kernel
        just to solve a very unlikely race condition that can only happen at
        boot up, change the code from using a variable branch to a
        static_branch.
      
        Not only does this solve the race condition, it actually will improve
        the performance of vsprintf() by removing the conditional branch that
        is only needed at boot"
      
      * tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        vsprintf: Replace memory barrier with static_key for random_ptr_key update
      9d38cd06
    • Shuah Khan (Samsung OSG)'s avatar
      usbip: usbip_host: fix bad unlock balance during stub_probe() · c171654c
      Shuah Khan (Samsung OSG) authored
      stub_probe() calls put_busid_priv() in an error path when device isn't
      found in the busid_table. Fix it by making put_busid_priv() safe to be
      called with null struct bus_id_priv pointer.
      
      This problem happens when "usbip bind" is run without loading usbip_host
      driver and then running modprobe. The first failed bind attempt unbinds
      the device from the original driver and when usbip_host is modprobed,
      stub_probe() runs and doesn't find the device in its busid table and calls
      put_busid_priv(0 with null bus_id_priv pointer.
      
      usbip-host 3-10.2: 3-10.2 is not in match_busid table...  skip!
      
      [  367.359679] =====================================
      [  367.359681] WARNING: bad unlock balance detected!
      [  367.359683] 4.17.0-rc4+ #5 Not tainted
      [  367.359685] -------------------------------------
      [  367.359688] modprobe/2768 is trying to release lock (
      [  367.359689]
      ==================================================================
      [  367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110
      [  367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768
      
      [  367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
      
      Fixes: 22076557 ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus
      Signed-off-by: default avatarShuah Khan (Samsung OSG) <shuah@kernel.org>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c171654c
    • Dan Carpenter's avatar
      drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl() · 2b620729
      Dan Carpenter authored
      There is a comment here which says that DIV_ROUND_UP() and that's where
      the problem comes from.  Say you pick:
      
      	args->bpp = UINT_MAX - 7;
      	args->width = 4;
      	args->height = 1;
      
      The integer overflow in DIV_ROUND_UP() means "cpp" is UINT_MAX / 8 and
      because of how we picked args->width that means cpp < UINT_MAX / 4.
      
      I've fixed it by preventing the integer overflow in DIV_ROUND_UP().  I
      removed the check for !cpp because it's not possible after this change.
      I also changed all the 0xffffffffU references to U32_MAX.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180516140026.GA19340@mwanda
      2b620729
    • Steven Rostedt (VMware)'s avatar
      vsprintf: Replace memory barrier with static_key for random_ptr_key update · 85f4f12d
      Steven Rostedt (VMware) authored
      Reviewing Tobin's patches for getting pointers out early before
      entropy has been established, I noticed that there's a lone smp_mb() in
      the code. As with most lone memory barriers, this one appears to be
      incorrectly used.
      
      We currently basically have this:
      
      	get_random_bytes(&ptr_key, sizeof(ptr_key));
      	/*
      	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
      	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
      	 * after get_random_bytes() returns.
      	 */
      	smp_mb();
      	WRITE_ONCE(have_filled_random_ptr_key, true);
      
      And later we have:
      
      	if (unlikely(!have_filled_random_ptr_key))
      		return string(buf, end, "(ptrval)", spec);
      
      /* Missing memory barrier here. */
      
      	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
      
      As the CPU can perform speculative loads, we could have a situation
      with the following:
      
      	CPU0				CPU1
      	----				----
      				   load ptr_key = 0
         store ptr_key = random
         smp_mb()
         store have_filled_random_ptr_key
      
      				   load have_filled_random_ptr_key = true
      
      				    BAD BAD BAD! (you're so bad!)
      
      Because nothing prevents CPU1 from loading ptr_key before loading
      have_filled_random_ptr_key.
      
      But this race is very unlikely, but we can't keep an incorrect smp_mb() in
      place. Instead, replace the have_filled_random_ptr_key with a static_branch
      not_filled_random_ptr_key, that is initialized to true and changed to false
      when we get enough entropy. If the update happens in early boot, the
      static_key is updated immediately, otherwise it will have to wait till
      entropy is filled and this happens in an interrupt handler which can't
      enable a static_key, as that requires a preemptible context. In that case, a
      work_queue is used to enable it, as entropy already took too long to
      establish in the first place waiting a little more shouldn't hurt anything.
      
      The benefit of using the static key is that the unlikely branch in
      vsprintf() now becomes a nop.
      
      Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      85f4f12d
    • Michel Thierry's avatar
      drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk · b579f924
      Michel Thierry authored
      Factor in clear values wherever required while updating destination
      min/max.
      
      References: HSDES#1604444184
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Cc: mesa-dev@lists.freedesktop.org
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180510200708.18097-1-michel.thierry@intel.com
      Cc: stable@vger.kernel.org
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180514165445.9198-1-michel.thierry@intel.com
      (backported from commit 0c79f9cb)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      b579f924
    • Deepak Rawat's avatar
      drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful · 91ba9f28
      Deepak Rawat authored
      SOU primary plane prepare_fb hook depends upon dmabuf_size to pin up BO
      (and not call a new vmw_dmabuf_init) when a new fb size is same as
      current fb. This was changed in a recent commit which is causing
      page_flip to fail on VM with low display memory and multi-mon failure
      when cycle monitors from secondary display.
      
      Cc: <stable@vger.kernel.org> # 4.14, 4.16
      Fixes: 20fb5a63 ("drm/vmwgfx: Unpin the screen object backup buffer when not used")
      Signed-off-by: default avatarDeepak Rawat <drawat@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      91ba9f28
  5. 15 May, 2018 4 commits
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 21b9f1c7
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "Here's a set of patches that fix a number of bugs in the in-kernel AFS
        client, including:
      
         - Fix directory locking to not use individual page locks for
           directory reading/scanning but rather to use a semaphore on the
           afs_vnode struct as the directory contents must be read in a single
           blob and data from different reads must not be mixed as the entire
           contents may be shuffled about between reads.
      
         - Fix address list parsing to handle port specifiers correctly.
      
         - Only give up callback records on a server if we actually talked to
           that server (we might not be able to access a server).
      
         - Fix some callback handling bugs, including refcounting,
           whole-volume callbacks and when callbacks actually get broken in
           response to a CB.CallBack op.
      
         - Fix some server/address rotation bugs, including giving up if we
           can't probe a server; giving up if a server says it doesn't have a
           volume, but there are more servers to try.
      
         - Fix the decoding of fetched statuses to be OpenAFS compatible.
      
         - Fix the handling of server lookups in Cache Manager ops (such as
           CB.InitCallBackState3) to use a UUID if possible and to handle no
           server being found.
      
         - Fix a bug in server lookup where not all addresses are compared.
      
         - Fix the non-encryption of calls that prevents some servers from
           being accessed (this also requires an AF_RXRPC patch that has
           already gone in through the net tree).
      
        There's also a patch that adds tracepoints to log Cache Manager ops
        that don't find a matching server, either by UUID or by address"
      
      * tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix the non-encryption of calls
        afs: Fix CB.CallBack handling
        afs: Fix whole-volume callback handling
        afs: Fix afs_find_server search loop
        afs: Fix the handling of an unfound server in CM operations
        afs: Add a tracepoint to record callbacks from unlisted servers
        afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
        afs: Fix VNOVOL handling in address rotation
        afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
        afs: Fix server rotation's handling of fileserver probe failure
        afs: Fix refcounting in callback registration
        afs: Fix giving up callbacks on server destruction
        afs: Fix address list parsing
        afs: Fix directory page locking
      21b9f1c7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · eeba2dfa
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small driver fixes: aacraid to fix an unknown IU type on task
        management functions which causes a firmware fault and vmw_pvscsi to
        change a return code to retry the operation instead of causing an
        immediate error"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: aacraid: Correct hba_send to include iu_type
        scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
      eeba2dfa
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux · ee4b65c2
      Linus Torvalds authored
      Pull drm fix from Dave Airlie:
       "This fixes the mmap regression reported to me on irc by an i686 kernel
        user today, he's tested the fix works, and I've audited all the drm
        drivers for the bad mmap usage and since we use the mmap offset as a
        lookup in a table we aren't inclined to have anything bad in there"
      
      [ See commit be83bbf8 ("mmap: introduce sane default mmap limits")
        for details and the note on why the GPU drivers were expected to be a
        special case.    - Linus ]
      
      * tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux:
        drm: set FMODE_UNSIGNED_OFFSET for drm files
      ee4b65c2
    • Geert Uytterhoeven's avatar
      mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit · 9f825e74
      Geert Uytterhoeven authored
      The __DIVIDE() macro checks whether it is called with a 32-bit or 64-bit
      dividend, to select the appropriate divide-and-round-up routine.
      As the check uses the ternary operator, the result will always be
      promoted to a type that can hold both results, i.e. unsigned long long.
      
      When using this result in a division on a 32-bit system, this may lead
      to link errors like:
      
          ERROR: "__udivdi3" [drivers/mtd/nand/raw/nand.ko] undefined!
      
      Fix this by casting the result of the division to the type of the
      dividend.
      
      Fixes: 8878b126 ("mtd: nand: add ->exec_op() implementation")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      9f825e74