1. 02 Oct, 2020 2 commits
    • Roman Gushchin's avatar
      mm: memcg/slab: fix slab statistics in !SMP configuration · be458311
      Roman Gushchin authored
      Since commit ea426c2a ("mm: memcg: prepare for byte-sized vmstat
      items") the write side of slab counters accepts a value in bytes and
      converts it to pages.  It happens in __mod_node_page_state().
      
      However a non-SMP version of __mod_node_page_state() doesn't perform
      this conversion.  It leads to incorrect (unrealistically high) slab
      counters values.  Fix this by adding a similar conversion to the non-SMP
      version of __mod_node_page_state().
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-and-tested-by: default avatarBastian Bittorf <bb@npl.de>
      Fixes: ea426c2a ("mm: memcg: prepare for byte-sized vmstat items")
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be458311
    • Linus Torvalds's avatar
      pipe: remove pipe_wait() and fix wakeup race with splice · 472e5b05
      Linus Torvalds authored
      The pipe splice code still used the old model of waiting for pipe IO by
      using a non-specific "pipe_wait()" that waited for any pipe event to
      happen, which depended on all pipe IO being entirely serialized by the
      pipe lock.  So by checking the state you were waiting for, and then
      adding yourself to the wait queue before dropping the lock, you were
      guaranteed to see all the wakeups.
      
      Strictly speaking, the actual wakeups were not done under the lock, but
      the pipe_wait() model still worked, because since the waiter held the
      lock when checking whether it should sleep, it would always see the
      current state, and the wakeup was always done after updating the state.
      
      However, commit 0ddad21d ("pipe: use exclusive waits when reading or
      writing") split the single wait-queue into two, and in the process also
      made the "wait for event" code wait for _two_ wait queues, and that then
      showed a race with the wakers that were not serialized by the pipe lock.
      
      It's only splice that used that "pipe_wait()" model, so the problem
      wasn't obvious, but Josef Bacik reports:
      
       "I hit a hang with fstest btrfs/187, which does a btrfs send into
        /dev/null. This works by creating a pipe, the write side is given to
        the kernel to write into, and the read side is handed to a thread that
        splices into a file, in this case /dev/null.
      
        The box that was hung had the write side stuck here [pipe_write] and
        the read side stuck here [splice_from_pipe_next -> pipe_wait].
      
        [ more details about pipe_wait() scenario ]
      
        The problem is we're doing the prepare_to_wait, which sets our state
        each time, however we can be woken up either with reads or writes. In
        the case above we race with the WRITER waking us up, and re-set our
        state to INTERRUPTIBLE, and thus never break out of schedule"
      
      Josef had a patch that avoided the issue in pipe_wait() by just making
      it set the state only once, but the deeper problem is that pipe_wait()
      depends on a level of synchonization by the pipe mutex that it really
      shouldn't.  And the whole "wait for any pipe state change" model really
      isn't very good to begin with.
      
      So rather than trying to work around things in pipe_wait(), remove that
      legacy model of "wait for arbitrary pipe event" entirely, and actually
      create functions that wait for the pipe actually being readable or
      writable, and can do so without depending on the pipe lock serializing
      everything.
      
      Fixes: 0ddad21d ("pipe: use exclusive waits when reading or writing")
      Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-and-tested-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      472e5b05
  2. 01 Oct, 2020 7 commits
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 44b6e23b
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a device reference counting bug in the Exynos IOMMU driver.
      
       - Lockdep fix for the Intel VT-d driver.
      
       - Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS
         ACPI table and caused IOMMU driver initialization failures in kdump
         kernels.
      
      * tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
        iommu/amd: Fix the overwritten field in IVMD header
        iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
      44b6e23b
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · eed2ef44
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "A previous commit to prevent AML memory opregions from accessing the
        kernel memory turned out to be too restrictive. Relax the permission
        check to permit the ACPI core to map kernel memory used for table
        overrides"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: permit ACPI core to map kernel memory used for table overrides
      eed2ef44
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm · fcadab74
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "AMD and vmwgfx fixes.
      
        Just dequeuing these a bit early as the AMD ones are bit larger than
        I'd prefer, but Alex missed last week so it's a double set of fixes.
        The larger ones are just register header fixes for the new chips that
        were just introduced in rc1 along with some new PCI IDs for new hw.
        Otherwise it is usual fixes.
      
        The vmwgfx fix was due to some testing I was doing and found we
        weren't booting properly, vmware had the fix internally so hurried it
      
        vmwgfx:
         - fix a regression due to TTM refactor
      
        amdgpu:
         - Fix potential double free in userptr handling
         - Sienna Cichlid and Navy Flounder udpates
         - Add Sienna Cichlid PCI IDs
         - Drop experimental flag for navi12
         - Raven fixes
         - Renoir fixes
         - HDCP fix
         - DCN3 fix for clang and older versions of gcc
         - Fix a runtime pm refcount issue"
      
      * tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: disable gfxoff temporarily for navy_flounder
        drm/amd/pm: setup APU dpm clock table in SMU HW initialization
        drm/vmwgfx: Fix error handling in get_node
        drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
        drm/amdgpu/swsmu/smu12: fix force clock handling for mclk
        drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
        drm/amdgpu/display: fix CFLAGS setup for DCN30
        drm/amd/display: fix return value check for hdcp_work
        drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc.
        drm/amd/pm: Removed fixed clock in auto mode DPM
        drm/amdgpu: remove experimental flag from navi12
        drm/amdgpu: add device ID for sienna_cichlid (v2)
        drm/amdgpu: use the AV1 defines for VCN 3.0
        drm/amdgpu: add VCN 3.0 AV1 registers
        drm/amdgpu: add the GC 10.3 VRS registers
        drm/amdgpu: prevent double kfree ttm->sg
      fcadab74
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · aa5ff935
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Two tracing fixes:
      
         - Fix temp buffer accounting that caused a WARNING for
           ftrace_dump_on_opps()
      
         - Move the recursion check in one of the function callback helpers to
           the beginning of the function, as if the rcu_is_watching() gets
           traced, it will cause a recursive loop that will crash the kernel"
      
      * tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Move RCU is watching check after recursion check
        tracing: Fix trace_find_next_entry() accounting of temp buffer size
      aa5ff935
    • Lu Baolu's avatar
      iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() · 1a3f2fd7
      Lu Baolu authored
      Lock(&iommu->lock) without disabling irq causes lockdep warnings.
      
      [   12.703950] ========================================================
      [   12.703962] WARNING: possible irq lock inversion dependency detected
      [   12.703975] 5.9.0-rc6+ #659 Not tainted
      [   12.703983] --------------------------------------------------------
      [   12.703995] systemd-udevd/284 just changed the state of lock:
      [   12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at:
                     iommu_flush_dev_iotlb.part.57+0x2e/0x90
      [   12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past:
      [   12.704043]  (&iommu->lock){+.+.}-{2:2}
      [   12.704045]
      
                     and interrupts could create inverse lock ordering between
                     them.
      
      [   12.704073]
                     other info that might help us debug this:
      [   12.704085]  Possible interrupt unsafe locking scenario:
      
      [   12.704097]        CPU0                    CPU1
      [   12.704106]        ----                    ----
      [   12.704115]   lock(&iommu->lock);
      [   12.704123]                                local_irq_disable();
      [   12.704134]                                lock(device_domain_lock);
      [   12.704146]                                lock(&iommu->lock);
      [   12.704158]   <Interrupt>
      [   12.704164]     lock(device_domain_lock);
      [   12.704174]
                      *** DEADLOCK ***
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      1a3f2fd7
    • Adrian Huang's avatar
      iommu/amd: Fix the overwritten field in IVMD header · 0bbe4ced
      Adrian Huang authored
      Commit 387caf0b ("iommu/amd: Treat per-device exclusion
      ranges as r/w unity-mapped regions") accidentally overwrites
      the 'flags' field in IVMD (struct ivmd_header) when the I/O
      virtualization memory definition is associated with the
      exclusion range entry. This leads to the corrupted IVMD table
      (incorrect checksum). The kdump kernel reports the invalid checksum:
      
      ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (20200717/tbprint-177)
      AMD-Vi: [Firmware Bug]: IVRS invalid checksum
      
      Fix the above-mentioned issue by modifying the 'struct unity_map_entry'
      member instead of the IVMD header.
      
      Cleanup: The *exclusion_range* functions are not used anymore, so
      get rid of them.
      
      Fixes: 387caf0b ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions")
      Reported-and-tested-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAdrian Huang <ahuang12@lenovo.com>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Link: https://lore.kernel.org/r/20200926102602.19177-1-adrianhuang0701@gmail.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0bbe4ced
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.9-2020-09-30' of... · 132d7c8a
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.9-2020-09-30' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
      
      amd-drm-fixes-5.9-2020-09-30:
      
      amdgpu:
      - Fix potential double free in userptr handling
      - Sienna Cichlid and Navy Flounder udpates
      - Add Sienna Cichlid PCI IDs
      - Drop experimental flag for navi12
      - Raven fixes
      - Renoir fixes
      - HDCP fix
      - DCN3 fix for clang and older versions of gcc
      - Fix a runtime pm refcount issue
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200930161326.4243-1-alexander.deucher@amd.com
      132d7c8a
  3. 30 Sep, 2020 8 commits
  4. 29 Sep, 2020 13 commits
  5. 28 Sep, 2020 3 commits
  6. 27 Sep, 2020 7 commits
    • Linus Torvalds's avatar
      Linux 5.9-rc7 · a1b8638b
      Linus Torvalds authored
      a1b8638b
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.9-4' of... · 16bc1d54
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - ignore compiler stubs for PPC to fix builds
      
       - fix the usage of --target mentioned in the LLVM document
      
      * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation/llvm: Fix clang target examples
        scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.*
      16bc1d54
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8818559
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the x86 interrupt code:
      
         - Unbreak the magic 'search the timer interrupt' logic in IO/APIC
           code which got wreckaged when the core interrupt code made the
           state tracking logic stricter.
      
           That caused the interrupt line to stay masked after switching from
           IO/APIC to PIC delivery mode, which obviously prevents interrupts
           from being delivered.
      
         - Make run_on_irqstack_code() typesafe. The function argument is a
           void pointer which is then cast to 'void (*fun)(void *).
      
           This breaks Control Flow Integrity checking in clang. Use proper
           helper functions for the three variants reuqired"
      
      * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Unbreak check_timer()
        x86/irq: Make run_on_irqstack_cond() typesafe
      f8818559
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba25f057
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "A set of clocksource/clockevents updates:
      
         - Reset the TI/DM timer before enabling it instead of doing it the
           other way round.
      
         - Initialize the reload value for the GX6605s timer correctly so the
           hardware counter starts at 0 again after overrun.
      
         - Make error return value negative in the h8300 timer init function"
      
      * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-gx6605s: Fixup counter reload
        clocksource/drivers/timer-ti-dm: Do reset before enable
        clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
      ba25f057
    • Peter Xu's avatar
      mm/thp: Split huge pmds/puds if they're pinned when fork() · d042035e
      Peter Xu authored
      Pinned pages shouldn't be write-protected when fork() happens, because
      follow up copy-on-write on these pages could cause the pinned pages to
      be replaced by random newly allocated pages.
      
      For huge PMDs, we split the huge pmd if pinning is detected.  So that
      future handling will be done by the PTE level (with our latest changes,
      each of the small pages will be copied).  We can achieve this by let
      copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
      fallthrough in copy_pmd_range() and finally land the next
      copy_pte_range() call.
      
      Huge PUDs will be even more special - so far it does not support
      anonymous pages.  But it can actually be done the same as the huge PMDs
      even if the split huge PUDs means to erase the PUD entries.  It'll
      guarantee the follow up fault ins will remap the same pages in either
      parent/child later.
      
      This might not be the most efficient way, but it should be easy and
      clean enough.  It should be fine, since we're tackling with a very rare
      case just to make sure userspaces that pinned some thps will still work
      even without MADV_DONTFORK and after they fork()ed.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d042035e
    • Peter Xu's avatar
      mm: Do early cow for pinned pages during fork() for ptes · 70e806e4
      Peter Xu authored
      This allows copy_pte_range() to do early cow if the pages were pinned on
      the source mm.
      
      Currently we don't have an accurate way to know whether a page is pinned
      or not.  The only thing we have is page_maybe_dma_pinned().  However
      that's good enough for now.  Especially, with the newly added
      mm->has_pinned flag to make sure we won't affect processes that never
      pinned any pages.
      
      It would be easier if we can do GFP_KERNEL allocation within
      copy_one_pte().  Unluckily, we can't because we're with the page table
      locks held for both the parent and child processes.  So the page
      allocation needs to be done outside copy_one_pte().
      
      Some trick is there in copy_present_pte(), majorly the wrprotect trick
      to block concurrent fast-gup.  Comments in the function should explain
      better in place.
      
      Oleg Nesterov reported a (probably harmless) bug during review that we
      didn't reset entry.val properly in copy_pte_range() so that potentially
      there's chance to call add_swap_count_continuation() multiple times on
      the same swp entry.  However that should be harmless since even if it
      happens, the same function (add_swap_count_continuation()) will return
      directly noticing that there're enough space for the swp counter.  So
      instead of a standalone stable patch, it is touched up in this patch
      directly.
      
      Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70e806e4
    • Peter Xu's avatar
      mm/fork: Pass new vma pointer into copy_page_range() · 7a4830c3
      Peter Xu authored
      This prepares for the future work to trigger early cow on pinned pages
      during fork().
      
      No functional change intended.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4830c3