1. 09 Sep, 2018 40 commits
    • Jann Horn's avatar
      userns: move user access out of the mutex · 656d6e6f
      Jann Horn authored
      commit 5820f140 upstream.
      
      The old code would hold the userns_state_mutex indefinitely if
      memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
      moving the memdup_user_nul in front of the mutex_lock().
      
      Note: This changes the error precedence of invalid buf/count/*ppos vs
      map already written / capabilities missing.
      
      Fixes: 22d917d8 ("userns: Rework the user_namespace adding uid/gid...")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      656d6e6f
    • Jann Horn's avatar
      sys: don't hold uts_sem while accessing userspace memory · b692c405
      Jann Horn authored
      commit 42a0cc34 upstream.
      
      Holding uts_sem as a writer while accessing userspace memory allows a
      namespace admin to stall all processes that attempt to take uts_sem.
      Instead, move data through stack buffers and don't access userspace memory
      while uts_sem is held.
      
      Cc: stable@vger.kernel.org
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b692c405
    • Jacob Pan's avatar
      iommu/vt-d: Fix dev iotlb pfsid use · c2ea292b
      Jacob Pan authored
      commit 1c48db44 upstream.
      
      PFSID should be used in the invalidation descriptor for flushing
      device IOTLBs on SRIOV VFs.
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: stable@vger.kernel.org
      Cc: "Ashok Raj" <ashok.raj@intel.com>
      Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2ea292b
    • Jacob Pan's avatar
      iommu/vt-d: Add definitions for PFSID · eb58c404
      Jacob Pan authored
      commit 0f725561 upstream.
      
      When SRIOV VF device IOTLB is invalidated, we need to provide
      the PF source ID such that IOMMU hardware can gauge the depth
      of invalidation queue which is shared among VFs. This is needed
      when device invalidation throttle (DIT) capability is supported.
      
      This patch adds bit definitions for checking and tracking PFSID.
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: stable@vger.kernel.org
      Cc: "Ashok Raj" <ashok.raj@intel.com>
      Cc: "Lu Baolu" <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb58c404
    • Peter Zijlstra's avatar
      mm/tlb: Remove tlb_remove_table() non-concurrent condition · 7cf82f3b
      Peter Zijlstra authored
      commit a6f57208 upstream.
      
      Will noted that only checking mm_users is incorrect; we should also
      check mm_count in order to cover CPUs that have a lazy reference to
      this mm (and could do speculative TLB operations).
      
      If removing this turns out to be a performance issue, we can
      re-instate a more complete check, but in tlb_table_flush() eliding the
      call_rcu_sched().
      
      Fixes: 26723911 ("mm, powerpc: move the RCU page-table freeing into generic code")
      Reported-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cf82f3b
    • Jon Hunter's avatar
      ARM: tegra: Fix Tegra30 Cardhu PCA954x reset · ddcb9270
      Jon Hunter authored
      commit 6e181190 upstream.
      
      On all versions of Tegra30 Cardhu, the reset signal to the NXP PCA9546
      I2C mux is connected to the Tegra GPIO BB0. Currently, this pin on the
      Tegra is not configured as a GPIO but as a special-function IO (SFIO)
      that is multiplexing the pin to an I2S controller. On exiting system
      suspend, I2C commands sent to the PCA9546 are failing because there is
      no ACK. Although it is not possible to see exactly what is happening
      to the reset during suspend, by ensuring it is configured as a GPIO
      and driven high, to de-assert the reset, the failures are no longer
      seen.
      
      Please note that this GPIO is also used to drive the reset signal
      going to the camera connector on the board. However, given that there
      is no camera support currently for Cardhu, this should not have any
      impact.
      
      Fixes: 40431d16 ("ARM: tegra: enable PCA9546 on Cardhu")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddcb9270
    • Trond Myklebust's avatar
      NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence() · d453f04e
      Trond Myklebust authored
      commit 8618289c upstream.
      
      We must drop the lock before we can sleep in referring_call_exists().
      Reported-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Fixes: 045d2a6d ("NFSv4.1: Delay callback processing...")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d453f04e
    • Trond Myklebust's avatar
      NFSv4: Fix locking in pnfs_generic_recover_commit_reqs · c5759d5a
      Trond Myklebust authored
      commit d0fbb1d8 upstream.
      
      The use of the inode->i_lock was converted to a mutex, but we forgot
      to remove the old inode unlock/lock() pair that allowed the layout
      segment to be put inside the loop.
      Reported-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Fixes: e824f99a ("NFSv4: Use a mutex to protect the per-inode commit...")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5759d5a
    • Bill Baker's avatar
      NFSv4 client live hangs after live data migration recovery · bf23ba37
      Bill Baker authored
      commit 0f90be13 upstream.
      
      After a live data migration event at the NFS server, the client may send
      I/O requests to the wrong server, causing a live hang due to repeated
      recovery events.  On the wire, this will appear as an I/O request failing
      with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
      NFS4ERR_BADSSESSION is returned because the session ID being used was
      issued by the other server and is not valid at the old server.
      
      The failure is caused by async worker threads having cached the transport
      (xprt) in the rpc_task structure.  After the migration recovery completes,
      the task is redispatched and the task resends the request to the wrong
      server based on the old value still present in tk_xprt.
      
      The solution is to recompute the tk_xprt field of the rpc_task structure
      so that the request goes to the correct server.
      Signed-off-by: default avatarBill Baker <bill.baker@oracle.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarHelen Chao <helen.chao@oracle.com>
      Fixes: fb43d172 ("SUNRPC: Use the multipath iterator to assign a ...")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf23ba37
    • Dan Carpenter's avatar
      pnfs/blocklayout: off by one in bl_map_stripe() · ec13c53d
      Dan Carpenter authored
      commit 0914bb96 upstream.
      
      "dev->nr_children" is the number of children which were parsed
      successfully in bl_parse_stripe().  It could be all of them and then, in
      that case, it is equal to v->stripe.volumes_count.  Either way, the >
      should be >= so that we don't go beyond the end of what we're supposed
      to.
      
      Fixes: 5c83746a ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org # 3.17+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec13c53d
    • Maciej S. Szmigiero's avatar
      block, bfq: return nbytes and not zero from struct cftype .write() method · ed480f2b
      Maciej S. Szmigiero authored
      commit fc8ebd01 upstream.
      
      The value that struct cftype .write() method returns is then directly
      returned to userspace as the value returned by write() syscall, so it
      should be the number of bytes actually written (or consumed) and not zero.
      
      Returning zero from write() syscall makes programs like /bin/echo or bash
      spin.
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed480f2b
    • Max Filippov's avatar
      xtensa: increase ranges in ___invalidate_{i,d}cache_all · fe806eb5
      Max Filippov authored
      commit fec3259c upstream.
      
      Cache invalidation macros use cache line size to iterate over
      invalidated cache lines, assuming that all cache ways are invalidated by
      single instruction, but xtensa ISA recommends to not assume that for
      future compatibility:
        In some implementations all ways at index Addry-1..z are invalidated
        regardless of the specified way, but for future compatibility this
        behavior should not be assumed.
      
      Iterate over all cache ways in ___invalidate_icache_all and
      ___invalidate_dcache_all.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe806eb5
    • Max Filippov's avatar
      xtensa: limit offsets in __loop_cache_{all,page} · 0d78efe0
      Max Filippov authored
      commit be75de25 upstream.
      
      When building kernel for xtensa cores with big cache lines (e.g. 128
      bytes or more) __loop_cache_all and __loop_cache_page may generate
      assembly instructions with immediate fields that are too big. This
      results in the following build errors:
      
        arch/xtensa/mm/misc.S: Assembler messages:
        arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '256'
        arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '384'
        arch/xtensa/kernel/head.S: Assembler messages:
        arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '256'
        arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '384'
        arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '256'
        arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '384'
        arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '256'
        arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '384'
      
      Add parameter max_immed to these macros and use it to limit values of
      immediate operands. Extract common code of these macros into the new
      macro __loop_cache_unroll.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d78efe0
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages · 025cc91f
      Paul Mackerras authored
      commit 8cfbdbdc upstream.
      
      Commit 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in
      the pinned physical page", 2018-07-17) added some checks to ensure
      that guest DMA mappings don't attempt to map more than the guest is
      entitled to access. However, errors in the logic mean that legitimate
      guest requests to map pages for DMA are being denied in some
      situations. Specifically, if the first page of the range passed to
      mm_iommu_get() is mapped with a normal page, and subsequent pages are
      mapped with transparent huge pages, we end up with mem->pageshift ==
      0. That means that the page size checks in mm_iommu_ua_to_hpa() and
      mm_iommu_up_to_hpa_rm() will always fail for every page in that
      region, and thus the guest can never map any memory in that region for
      DMA, typically leading to a flood of error messages like this:
      
        qemu-system-ppc64: VFIO_MAP_DMA: -22
        qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)
      
      The logic errors in mm_iommu_get() are:
      
        (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
            call (meaning that find_linux_pte() returns the pte for the
            first address in the range, not the address we are currently up
            to);
        (b) use of 'pageshift' as the variable to receive the hugepage shift
            returned by find_linux_pte() - for a normal page this gets set
            to 0, leading to us setting mem->pageshift to 0 when we conclude
            that the pte returned by find_linux_pte() didn't match the page
            we were looking at;
        (c) comparing 'compshift', which is a page order, i.e. log base 2 of
            the number of pages, with 'pageshift', which is a log base 2 of
            the number of bytes.
      
      To fix these problems, this patch introduces 'cur_ua' to hold the
      current user address and uses that in the find_linux_pte() call;
      introduces 'pteshift' to hold the hugepage shift found by
      find_linux_pte(); and compares 'pteshift' with 'compshift +
      PAGE_SHIFT' rather than 'compshift'.
      
      The patch also moves the local_irq_restore to the point after the PTE
      pointer returned by find_linux_pte() has been dereferenced because
      otherwise the PTE could change underneath us, and adds a check to
      avoid doing the find_linux_pte() call once mem->pageshift has been
      reduced to PAGE_SHIFT, as an optimization.
      
      Fixes: 76fa4975 ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      025cc91f
    • Paolo Bonzini's avatar
      KVM: VMX: fixes for vmentry_l1d_flush module parameter · 58936d4d
      Paolo Bonzini authored
      commit 0027ff2a upstream.
      
      Two bug fixes:
      
      1) missing entries in the l1d_param array; this can cause a host crash
      if an access attempts to reach the missing entry. Future-proof the get
      function against any overflows as well.  However, the two entries
      VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
      not be accepted by the parse function, so disable them there.
      
      2) invalid values must be rejected even if the CPU does not have the
      bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
      
      ... and a small refactoring, since the .cmd field is redundant with
      the index in the array.
      Reported-by: default avatarBandan Das <bsd@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: a7b9020bSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58936d4d
    • zhangyi (F)'s avatar
      PM / sleep: wakeup: Fix build error caused by missing SRCU support · 015156f5
      zhangyi (F) authored
      commit 3df6f61f upstream.
      
      Commit ea0212f4 (power: auto select CONFIG_SRCU) made the code in
      drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to
      select CONFIG_SRCU in Kconfig, which leads to the following build
      error if CONFIG_SRCU is not selected somewhere else:
      
      drivers/built-in.o: In function `wakeup_source_remove':
      (.text+0x3c6fc): undefined reference to `synchronize_srcu'
      drivers/built-in.o: In function `pm_print_active_wakeup_sources':
      (.text+0x3c7a8): undefined reference to `__srcu_read_lock'
      drivers/built-in.o: In function `pm_print_active_wakeup_sources':
      (.text+0x3c84c): undefined reference to `__srcu_read_unlock'
      drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
      (.text+0x3d1d8): undefined reference to `__srcu_read_lock'
      drivers/built-in.o: In function `device_wakeup_arm_wake_irqs':
      (.text+0x3d228): undefined reference to `__srcu_read_unlock'
      drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
      (.text+0x3d24c): undefined reference to `__srcu_read_lock'
      drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs':
      (.text+0x3d29c): undefined reference to `__srcu_read_unlock'
      drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu'
      
      Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled.
      
      Fixes: ea0212f4 (power: auto select CONFIG_SRCU)
      Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      [ rjw: Minor subject/changelog fixups ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      015156f5
    • Henry Willard's avatar
      cpufreq: governor: Avoid accessing invalid governor_data · 924383ed
      Henry Willard authored
      commit 2a3eb51e upstream.
      
      If cppc_cpufreq.ko is deleted at the same time that tuned-adm is
      changing profiles, there is a small chance that a race can occur
      between cpufreq_dbs_governor_exit() and cpufreq_dbs_governor_limits()
      resulting in a system failure when the latter tries to use
      policy->governor_data that has been freed by the former.
      
      This patch uses gov_dbs_data_mutex to synchronize access.
      
      Fixes: e788892b (cpufreq: governor: Get rid of governor events)
      Signed-off-by: default avatarHenry Willard <henry.willard@oracle.com>
      [ rjw: Subject, minor white space adjustment ]
      Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      924383ed
    • Peter Kalauskas's avatar
      drivers/block/zram/zram_drv.c: fix bug storing backing_dev · 256f63f5
      Peter Kalauskas authored
      commit c8bd134a upstream.
      
      The call to strlcpy in backing_dev_store is incorrect. It should take
      the size of the destination buffer instead of the size of the source
      buffer.  Additionally, ignore the newline character (\n) when reading
      the new file_name buffer. This makes it possible to set the backing_dev
      as follows:
      
      	echo /dev/sdX > /sys/block/zram0/backing_dev
      
      The reason it worked before was the fact that strlcpy() copies 'len - 1'
      bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
      copy the trailing new line symbol.  Which also means that "echo -n
      /dev/sdX" most likely was broken.
      Signed-off-by: default avatarPeter Kalauskas <peskal@google.com>
      Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.google.comAcked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: <stable@vger.kernel.org>    [4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      256f63f5
    • Amir Goldstein's avatar
      ovl: fix wrong use of impure dir cache in ovl_iterate() · 8840ca57
      Amir Goldstein authored
      commit 67810693 upstream.
      
      Only upper dir can be impure, but if we are in the middle of
      iterating a lower real dir, dir could be copied up and marked
      impure. We only want the impure cache if we started iterating
      a real upper dir to begin with.
      
      Aditya Kali reported that the following reproducer hits the
      WARN_ON(!cache->refcount) in ovl_get_cache():
      
       docker run --rm drupal:8.5.4-fpm-alpine \
          sh -c 'cd /var/www/html/vendor/symfony && \
                 chown -R www-data:www-data . && ls -l .'
      Reported-by: default avatarAditya Kali <adityakali@google.com>
      Tested-by: default avatarAditya Kali <adityakali@google.com>
      Fixes: 4edb83bb ('ovl: constant d_ino for non-merge dirs')
      Cc: <stable@vger.kernel.org> # v4.14
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8840ca57
    • Rafael David Tinoco's avatar
      mfd: hi655x: Fix regmap area declared size for hi655x · aa9ceea2
      Rafael David Tinoco authored
      commit 6afebb70 upstream.
      
      Fixes https://bugs.linaro.org/show_bug.cgi?id=3903
      
      LTP Functional tests have caused a bad paging request when triggering
      the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading
      regmap/f8000000.pmic/registers file during read_all test):
      
      Unable to handle kernel paging request at virtual address ffff0
      [ffff00000984e000] pgd=0000000077ffe803, pud=0000000077ffd803,0
      Internal error: Oops: 96000007 [#1] SMP
      ...
      Hardware name: HiKey Development Board (DT)
      ...
      Call trace:
       regmap_mmio_read8+0x24/0x40
       regmap_mmio_read+0x48/0x70
       _regmap_bus_reg_read+0x38/0x48
       _regmap_read+0x68/0x170
       regmap_read+0x50/0x78
       regmap_read_debugfs+0x1a0/0x308
       regmap_map_read_file+0x48/0x58
       full_proxy_read+0x68/0x98
       __vfs_read+0x48/0x80
       vfs_read+0x94/0x150
       SyS_read+0x6c/0xd8
       el0_svc_naked+0x30/0x34
      Code: aa1e03e0 d503201f f9400280 8b334000 (39400000)
      
      Investigations have showed that, when triggered by debugfs read()
      handler, the mmio regmap logic was reading a bigger (16k) register area
      than the one mapped by devm_ioremap_resource() during hi655x-pmic probe
      time (4k).
      
      This commit changes hi655x's max register, according to HW specs, to be
      the same as the one declared in the pmic device in hi6220's dts, fixing
      the issue.
      
      Cc: <stable@vger.kernel.org> #v4.9 #v4.14 #v4.16 #v4.17
      Signed-off-by: default avatarRafael David Tinoco <rafael.tinoco@linaro.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa9ceea2
    • Steven Rostedt (VMware)'s avatar
      uprobes: Use synchronize_rcu() not synchronize_sched() · 4f6789ca
      Steven Rostedt (VMware) authored
      commit 016f8ffc upstream.
      
      While debugging another bug, I was looking at all the synchronize*()
      functions being used in kernel/trace, and noticed that trace_uprobes was
      using synchronize_sched(), with a comment to synchronize with
      {u,ret}_probe_trace_func(). When looking at those functions, the data is
      protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
      is using the wrong synchronize_*() function.
      
      Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 70ed91c6 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f6789ca
    • Kamalesh Babulal's avatar
      livepatch: Validate module/old func name length · a36e2aa9
      Kamalesh Babulal authored
      commit 6e9df95b upstream.
      
      livepatch module author can pass module name/old function name with more
      than the defined character limit. With obj->name length greater than
      MODULE_NAME_LEN, the livepatch module gets loaded but waits forever on
      the module specified by obj->name to be loaded. It also populates a /sys
      directory with an untruncated object name.
      
      In the case of funcs->old_name length greater then KSYM_NAME_LEN, it
      would not match against any of the symbol table entries. Instead loop
      through the symbol table comparing them against a nonexisting function,
      which can be avoided.
      
      The same issues apply, to misspelled/incorrect names. At least gatekeep
      the modules with over the limit string length, by checking for their
      length during livepatch module registration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a36e2aa9
    • Steven Rostedt (VMware)'s avatar
      printk/tracing: Do not trace printk_nmi_enter() · 68a735eb
      Steven Rostedt (VMware) authored
      commit d1c392c9 upstream.
      
      I hit the following splat in my tests:
      
      ------------[ cut here ]------------
      IRQs not enabled as expected
      WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
      Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      EIP: tick_nohz_idle_enter+0x44/0x8c
      Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
      75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
      e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
      EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
      ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
      CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
      Call Trace:
       do_idle+0x33/0x202
       cpu_startup_entry+0x61/0x63
       start_secondary+0x18e/0x1ed
       startup_32_smp+0x164/0x168
      irq event stamp: 18773830
      hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
      hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
      softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
      softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
      ---[ end trace b7c64aa79e17954a ]---
      
      After a bit of debugging, I found what was happening. This would trigger
      when performing "perf" with a high NMI interrupt rate, while enabling and
      disabling function tracer. Ftrace uses breakpoints to convert the nops at
      the start of functions to calls to the function trampolines. The breakpoint
      traps disable interrupts and this makes calls into lockdep via the
      trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
      
        do_idle {
      
          [interrupts enabled]
      
          <interrupt> [interrupts disabled]
      	TRACE_IRQS_OFF [lockdep says irqs off]
      	[...]
      	TRACE_IRQS_IRET
      	    test if pt_regs say return to interrupts enabled [yes]
      	    TRACE_IRQS_ON [lockdep says irqs are on]
      
      	    <nmi>
      		nmi_enter() {
      		    printk_nmi_enter() [traced by ftrace]
      		    [ hit ftrace breakpoint ]
      		    <breakpoint exception>
      			TRACE_IRQS_OFF [lockdep says irqs off]
      			[...]
      			TRACE_IRQS_IRET [return from breakpoint]
      			   test if pt_regs say interrupts enabled [no]
      			   [iret back to interrupt]
      	   [iret back to code]
      
          tick_nohz_idle_enter() {
      
      	lockdep_assert_irqs_enabled() [lockdep say no!]
      
      Although interrupts are indeed enabled, lockdep thinks it is not, and since
      we now do asserts via lockdep, it gives a false warning. The issue here is
      that printk_nmi_enter() is called before lockdep_off(), which disables
      lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
      printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
      confused.
      
      Cc: stable@vger.kernel.org
      Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68a735eb
    • Steven Rostedt (VMware)'s avatar
      tracing/blktrace: Fix to allow setting same value · cbde057a
      Steven Rostedt (VMware) authored
      commit 757d9140 upstream.
      
      Masami Hiramatsu reported:
      
        Current trace-enable attribute in sysfs returns an error
        if user writes the same setting value as current one,
        e.g.
      
          # cat /sys/block/sda/trace/enable
          0
          # echo 0 > /sys/block/sda/trace/enable
          bash: echo: write error: Invalid argument
          # echo 1 > /sys/block/sda/trace/enable
          # echo 1 > /sys/block/sda/trace/enable
          bash: echo: write error: Device or resource busy
      
        But this is not a preferred behavior, it should ignore
        if new setting is same as current one. This fixes the
        problem as below.
      
          # cat /sys/block/sda/trace/enable
          0
          # echo 0 > /sys/block/sda/trace/enable
          # echo 1 > /sys/block/sda/trace/enable
          # echo 1 > /sys/block/sda/trace/enable
      
      Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: cd649b8b ("blktrace: remove sysfs_blk_trace_enable_show/store()")
      Reported-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbde057a
    • Steven Rostedt (VMware)'s avatar
      tracing: Do not call start/stop() functions when tracing_on does not change · 4c901675
      Steven Rostedt (VMware) authored
      commit f143641b upstream.
      
      Currently, when one echo's in 1 into tracing_on, the current tracer's
      "start()" function is executed, even if tracing_on was already one. This can
      lead to strange side effects. One being that if the hwlat tracer is enabled,
      and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
      start() function is called again which will recreate another kernel thread,
      and make it unable to remove the old one.
      
      Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
      
      Cc: stable@vger.kernel.org
      Fixes: 2df8f8a6 ("tracing: Fix regression with irqsoff tracer and tracing_on file")
      Reported-by: default avatarErica Bugden <erica.bugden@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c901675
    • Johan Hovold's avatar
      rtc: omap: fix potential crash on power off · 2b4c940d
      Johan Hovold authored
      commit 5c8b84f4 upstream.
      
      Do not set the system power-off callback and omap power-off rtc pointer
      until we're done setting up our device to avoid leaving stale pointers
      around after a late probe error.
      
      Fixes: 97ea1906 ("rtc: omap: Support ext_wakeup configuration")
      Cc: stable <stable@vger.kernel.org>     # 4.9
      Cc: Marcin Niestroj <m.niestroj@grinn-global.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Reviewed-by: default avatarMarcin Niestroj <m.niestroj@grinn-global.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b4c940d
    • Nadav Amit's avatar
      vmw_balloon: fix VMCI use when balloon built into kernel · bbac5374
      Nadav Amit authored
      commit c3cc1b0f upstream.
      
      Currently, when all modules, including VMCI and VMware balloon are built
      into the kernel, the initialization of the balloon happens before the
      VMCI is probed. As a result, the balloon fails to initialize the VMCI
      doorbell, which it uses to get asynchronous requests for balloon size
      changes.
      
      The problem can be seen in the logs, in the form of the following
      message:
      	"vmw_balloon: failed to initialize vmci doorbell"
      
      The driver would work correctly but slightly less efficiently, probing
      for requests periodically. This patch changes the balloon to be
      initialized using late_initcall() instead of module_init() to address
      this issue. It does not address a situation in which VMCI is built as a
      module and the balloon is built into the kernel.
      
      Fixes: 48e3d668 ("VMware balloon: Enable notification via VMCI")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarXavier Deguillard <xdeguillard@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbac5374
    • Nadav Amit's avatar
      vmw_balloon: VMCI_DOORBELL_SET does not check status · 89667b26
      Nadav Amit authored
      commit ce664331 upstream.
      
      When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for
      some reason it does not consider the status and looks at the result.
      However, the hypervisor does not update the result - it updates the
      status. This might cause VMCI doorbell not to be enabled, resulting in
      degraded performance.
      
      Fixes: 48e3d668 ("VMware balloon: Enable notification via VMCI")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarXavier Deguillard <xdeguillard@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89667b26
    • Nadav Amit's avatar
      vmw_balloon: do not use 2MB without batching · d3b40384
      Nadav Amit authored
      commit 5081efd1 upstream.
      
      If the hypervisor sets 2MB batching is on, while batching is cleared,
      the balloon code breaks. In this case the legacy mechanism is used with
      2MB page. The VM would report a 2MB page is ballooned, and the
      hypervisor would only take the first 4KB.
      
      While the hypervisor should not report such settings, make the code more
      robust by not enabling 2MB support without batching.
      
      Fixes: 365bd7ef ("VMware balloon: Support 2m page ballooning.")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarXavier Deguillard <xdeguillard@vmware.com>
      Signed-off-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3b40384
    • Nadav Amit's avatar
      vmw_balloon: fix inflation of 64-bit GFNs · 9fd44e90
      Nadav Amit authored
      commit 09755690 upstream.
      
      When balloon batching is not supported by the hypervisor, the guest
      frame number (GFN) must fit in 32-bit. However, due to a bug, this check
      was mistakenly ignored. In practice, when total RAM is greater than
      16TB, the balloon does not work currently, making this bug unlikely to
      happen.
      
      Fixes: ef0f8f11 ("VMware balloon: partially inline vmballoon_reserve_page.")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarXavier Deguillard <xdeguillard@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fd44e90
    • Chanwoo Choi's avatar
      extcon: Release locking when sending the notification of connector state · c0a8e047
      Chanwoo Choi authored
      commit 8a9dbb77 upstream.
      
      Previously, extcon used the spinlock before calling the notifier_call_chain
      to prevent the scheduled out of task and to prevent the notification delay.
      When spinlock is locked for sending the notification, deadlock issue
      occured on the side of extcon consumer device. To fix this issue,
      extcon consumer device should always use the work. it is always not
      reasonable to use work.
      
      To fix this issue on extcon consumer device, release locking when sending
      the notification of connector state.
      
      Fixes: ab11af04 ("extcon: Add the synchronization extcon APIs to support the notification")
      Cc: stable@vger.kernel.org
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: Kishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0a8e047
    • Lars-Peter Clausen's avatar
      iio: ad9523: Fix return value for ad952x_store() · 3f948190
      Lars-Peter Clausen authored
      commit 9a5094ca upstream.
      
      A sysfs write callback function needs to either return the number of
      consumed characters or an error.
      
      The ad952x_store() function currently returns 0 if the input value was "0",
      this will signal that no characters have been consumed and the function
      will be called repeatedly in a loop indefinitely. Fix this by returning
      number of supplied characters to indicate that the whole input string has
      been consumed.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarAlexandru Ardelean <alexandru.ardelean@analog.com>
      Fixes: cd1678f9 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f948190
    • Lars-Peter Clausen's avatar
      iio: ad9523: Fix displayed phase · e4d3a251
      Lars-Peter Clausen authored
      commit 5a4e33c1 upstream.
      
      Fix the displayed phase for the ad9523 driver. Currently the most
      significant decimal place is dropped and all other digits are shifted one
      to the left. This is due to a multiplication by 10, which is not necessary,
      so remove it.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarAlexandru Ardelean <alexandru.ardelean@analog.com>
      Fixes: cd1678f9 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4d3a251
    • Gustavo A. R. Silva's avatar
      iio: sca3000: Fix missing return in switch · b8637491
      Gustavo A. R. Silva authored
      commit c5b974be upstream.
      
      The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
      return and will fall through to the default case and errorenously
      return -EINVAL.
      
      Fix this by adding in missing *return ret*.
      
      Fixes: 626f971b ("staging:iio:accel:sca3000 Add write support to the low pass filter control")
      Reported-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8637491
    • Dexuan Cui's avatar
      Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind() · 91b48a9c
      Dexuan Cui authored
      commit d3b26dd7 upstream.
      
      Before setting channel->rescind in vmbus_rescind_cleanup(), we should make
      sure the channel callback won't run any more, otherwise a high-level
      driver like pci_hyperv, which may be infinitely waiting for the host VSP's
      response and notices the channel has been rescinded, can't safely give
      up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's
      unsafe to exit from wait_for_response() and proceed with the on-stack
      variable "comp_pkt" popped. The issue was originally spotted by
      Michael Kelley <mikelley@microsoft.com>.
      
      In vmbus_close_internal(), the patch also minimizes the range protected by
      disabling/enabling channel->callback_event: we don't really need that for
      the whole function.
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91b48a9c
    • Tycho Andersen's avatar
      uart: fix race between uart_put_char() and uart_shutdown() · d286cfd4
      Tycho Andersen authored
      commit a5ba1d95 upstream.
      
      We have reports of the following crash:
      
          PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
          #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
          #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
          #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
          #3 [ffff88085c6db860] no_context at ffffffff81050b8f
          #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
          #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
          #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
          #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
          #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
          [exception RIP: uart_put_char+149]
          RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
          RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
          RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
          RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
          R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
          R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
          ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
          #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
          #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
          #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
          #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
          #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
          #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
          #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
          #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
          #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
          #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
          #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
          #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
          #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​
      
      after slogging through some dissasembly:
      
      ffffffff814b6720 <uart_put_char>:
      ffffffff814b6720:	55                   	push   %rbp
      ffffffff814b6721:	48 89 e5             	mov    %rsp,%rbp
      ffffffff814b6724:	48 83 ec 20          	sub    $0x20,%rsp
      ffffffff814b6728:	48 89 1c 24          	mov    %rbx,(%rsp)
      ffffffff814b672c:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
      ffffffff814b6731:	4c 89 6c 24 10       	mov    %r13,0x10(%rsp)
      ffffffff814b6736:	4c 89 74 24 18       	mov    %r14,0x18(%rsp)
      ffffffff814b673b:	e8 b0 8e 58 00       	callq  ffffffff81a3f5f0 <mcount>
      ffffffff814b6740:	4c 8b a7 88 02 00 00 	mov    0x288(%rdi),%r12
      ffffffff814b6747:	45 31 ed             	xor    %r13d,%r13d
      ffffffff814b674a:	41 89 f6             	mov    %esi,%r14d
      ffffffff814b674d:	49 83 bc 24 70 01 00 	cmpq   $0x0,0x170(%r12)
      ffffffff814b6754:	00 00
      ffffffff814b6756:	49 8b 9c 24 80 01 00 	mov    0x180(%r12),%rbx
      ffffffff814b675d:	00
      ffffffff814b675e:	74 2f                	je     ffffffff814b678f <uart_put_char+0x6f>
      ffffffff814b6760:	48 89 df             	mov    %rbx,%rdi
      ffffffff814b6763:	e8 a8 67 58 00       	callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
      ffffffff814b6768:	41 8b 8c 24 78 01 00 	mov    0x178(%r12),%ecx
      ffffffff814b676f:	00
      ffffffff814b6770:	89 ca                	mov    %ecx,%edx
      ffffffff814b6772:	f7 d2                	not    %edx
      ffffffff814b6774:	41 03 94 24 7c 01 00 	add    0x17c(%r12),%edx
      ffffffff814b677b:	00
      ffffffff814b677c:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
      ffffffff814b6782:	75 23                	jne    ffffffff814b67a7 <uart_put_char+0x87>
      ffffffff814b6784:	48 89 c6             	mov    %rax,%rsi
      ffffffff814b6787:	48 89 df             	mov    %rbx,%rdi
      ffffffff814b678a:	e8 e1 64 58 00       	callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
      ffffffff814b678f:	44 89 e8             	mov    %r13d,%eax
      ffffffff814b6792:	48 8b 1c 24          	mov    (%rsp),%rbx
      ffffffff814b6796:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
      ffffffff814b679b:	4c 8b 6c 24 10       	mov    0x10(%rsp),%r13
      ffffffff814b67a0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
      ffffffff814b67a5:	c9                   	leaveq
      ffffffff814b67a6:	c3                   	retq
      ffffffff814b67a7:	49 8b 94 24 70 01 00 	mov    0x170(%r12),%rdx
      ffffffff814b67ae:	00
      ffffffff814b67af:	48 63 c9             	movslq %ecx,%rcx
      ffffffff814b67b2:	41 b5 01             	mov    $0x1,%r13b
      ffffffff814b67b5:	44 88 34 0a          	mov    %r14b,(%rdx,%rcx,1)
      ffffffff814b67b9:	41 8b 94 24 78 01 00 	mov    0x178(%r12),%edx
      ffffffff814b67c0:	00
      ffffffff814b67c1:	83 c2 01             	add    $0x1,%edx
      ffffffff814b67c4:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
      ffffffff814b67ca:	41 89 94 24 78 01 00 	mov    %edx,0x178(%r12)
      ffffffff814b67d1:	00
      ffffffff814b67d2:	eb b0                	jmp    ffffffff814b6784 <uart_put_char+0x64>
      ffffffff814b67d4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
      ffffffff814b67db:	00 00 00 00 00
      
      for our build, this is crashing at:
      
          circ->buf[circ->head] = c;
      
      Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
      protected by the "per-port mutex", which based on uart_port_check() is
      state->port.mutex. Indeed, the lock acquired in uart_put_char() is
      uport->lock, i.e. not the same lock.
      
      Anyway, since the lock is not acquired, if uart_shutdown() is called, the
      last chunk of that function may release state->xmit.buf before its assigned
      to null, and cause the race above.
      
      To fix it, let's lock uport->lock when allocating/deallocating
      state->xmit.buf in addition to the per-port mutex.
      
      v2: switch to locking uport->lock on allocation/deallocation instead of
          locking the per-port mutex in uart_put_char. Note that since
          uport->lock is a spin lock, we have to switch the allocation to
          GFP_ATOMIC.
      v3: move the allocation outside the lock, so we can switch back to
          GFP_KERNEL
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.ws>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d286cfd4
    • Mikulas Patocka's avatar
      dm crypt: don't decrease device limits · 5044eb05
      Mikulas Patocka authored
      commit bc9e9cf0 upstream.
      
      dm-crypt should only increase device limits, it should not decrease them.
      
      This fixes a bug where the user could creates a crypt device with 1024
      sector size on the top of scsi device that had 4096 logical block size.
      The limit 4096 would be lost and the user could incorrectly send
      1024-I/Os to the crypt device.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5044eb05
    • Ilya Dryomov's avatar
      dm cache metadata: set dirty on all cache blocks after a crash · f961be89
      Ilya Dryomov authored
      commit 5b1fe7be upstream.
      
      Quoting Documentation/device-mapper/cache.txt:
      
        The 'dirty' state for a cache block changes far too frequently for us
        to keep updating it on the fly.  So we treat it as a hint.  In normal
        operation it will be written when the dm device is suspended.  If the
        system crashes all cache blocks will be assumed dirty when restarted.
      
      This got broken in commit f177940a ("dm cache metadata: switch to
      using the new cursor api for loading metadata") in 4.9, which removed
      the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
      flag) when loading cache blocks.  This results in data corruption on an
      unclean shutdown with dirty cache blocks on the fast device.  After the
      crash those blocks are considered clean and may get evicted from the
      cache at any time.  This can be demonstrated by doing a lot of reads
      to trigger individual evictions, but uncache is more predictable:
      
        ### Disable auto-activation in lvm.conf to be able to do uncache in
        ### time (i.e. see uncache doing flushing) when the fix is applied.
      
        # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
        # vgcreate vg_cache /dev/vdb /dev/vdc
        # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
        # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
        # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
        # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
        # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
        # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
        # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
        0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        # dmsetup status vg_cache-lv_slowdev
        0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
                                                                  ^^^^
                                      7065 * 64k = 441M yet to be written to the slow device
        # echo b >/proc/sysrq-trigger
      
        # vgchange -ay vg_cache
        # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
        0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        # lvconvert --uncache vg_cache/lv_slowdev
        Flushing 0 blocks for cache vg_cache/lv_slowdev.
        Logical volume "lv_cachedev" successfully removed
        Logical volume vg_cache/lv_slowdev is not cached.
        # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
        0fe00000:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
        0fe00010:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
      
      This is the case with both v1 and v2 cache pool metatata formats.
      
      After applying this patch:
      
        # vgchange -ay vg_cache
        # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
        0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        # lvconvert --uncache vg_cache/lv_slowdev
        Flushing 3724 blocks for cache vg_cache/lv_slowdev.
        ...
        Flushing 71 blocks for cache vg_cache/lv_slowdev.
        Logical volume "lv_cachedev" successfully removed
        Logical volume vg_cache/lv_slowdev is not cached.
        # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
        0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
        0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      
      Cc: stable@vger.kernel.org
      Fixes: f177940a ("dm cache metadata: switch to using the new cursor api for loading metadata")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f961be89
    • Mike Snitzer's avatar
      dm cache metadata: save in-core policy_hint_size to on-disk superblock · b7227e60
      Mike Snitzer authored
      commit fd2fa954 upstream.
      
      policy_hint_size starts as 0 during __write_initial_superblock().  It
      isn't until the policy is loaded that policy_hint_size is set in-core
      (cmd->policy_hint_size).  But it never got recorded in the on-disk
      superblock because __commit_transaction() didn't deal with transfering
      the in-core cmd->policy_hint_size to the on-disk superblock.
      
      The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
      __begin_transaction_flags() which re-reads all superblock fields.
      Because the superblock's policy_hint_size was never properly stored, when
      the cache was created, hints_array_available() would always return false
      when re-activating a previously created cache.  This means
      __load_mappings() always considered the hints invalid and never made use
      of the hints (these hints served to optimize).
      
      Another detremental side-effect of this oversight is the cache_check
      utility would fail with: "invalid hint width: 0"
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7227e60
    • Hou Tao's avatar
      dm thin: stop no_space_timeout worker when switching to write-mode · 3bef8825
      Hou Tao authored
      commit 75294442 upstream.
      
      Now both check_for_space() and do_no_space_timeout() will read & write
      pool->pf.error_if_no_space.  If these functions run concurrently, as
      shown in the following case, the default setting of "queue_if_no_space"
      can get lost.
      
      precondition:
          * error_if_no_space = false (aka "queue_if_no_space")
          * pool is in Out-of-Data-Space (OODS) mode
          * no_space_timeout worker has been queued
      
      CPU 0:                          CPU 1:
      // delete a thin device
      process_delete_mesg()
      // check_for_space() invoked by commit()
      set_pool_mode(pool, PM_WRITE)
          pool->pf.error_if_no_space = \
           pt->requested_pf.error_if_no_space
      
      				// timeout, pool is still in OODS mode
      				do_no_space_timeout
      				    // "queue_if_no_space" config is lost
      				    pool->pf.error_if_no_space = true
          pool->pf.mode = new_mode
      
      Fix it by stopping no_space_timeout worker when switching to write mode.
      
      Fixes: bcc696fa ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bef8825