1. 09 Sep, 2020 1 commit
  2. 08 Sep, 2020 2 commits
  3. 03 Sep, 2020 1 commit
  4. 02 Sep, 2020 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc · 675bceb0
      Aneesh Kumar K.V authored
      The test is broken w.r.t page table update rules and results in kernel
      crash as below. Disable the support until we get the tests updated.
      
      [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
      cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
          pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
          lr: c0000000005eeeec: pte_update+0x11c/0x190
          sp: c000000c6d1e7950
         msr: 8000000002029033
        current = 0xc000000c6d172c80
        paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
          pid   = 1, comm = swapper/0
      kernel BUG at arch/powerpc/mm/pgtable.c:304!
      [link register   ] c0000000005eeeec pte_update+0x11c/0x190
      [c000000c6d1e7950] 0000000000000001 (unreliable)
      [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
      [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
      [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
      [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
      [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
      [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
      [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
      
      With DEBUG_VM disabled
      
      [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
      [   20.530183] Faulting instruction address: 0xc0000000000df330
      cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
          pc: c0000000000df330: memset+0x68/0x104
          lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
          sp: c000000c6d19f990
         msr: 8000000002009033
         dar: 0
        current = 0xc000000c6d177480
        paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
          pid   = 1, comm = swapper/0
      [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
      [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
      [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
      [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
      [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
      [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
      [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
      [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
      33:mon>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200902040122.136414-1-aneesh.kumar@linux.ibm.com
      675bceb0
  5. 01 Sep, 2020 1 commit
  6. 28 Aug, 2020 2 commits
    • Aneesh Kumar K.V's avatar
      powerpc/book3s64/radix: Fix boot failure with large amount of guest memory · 103a8542
      Aneesh Kumar K.V authored
      If the hypervisor doesn't support hugepages, the kernel ends up allocating a large
      number of page table pages. The early page table allocation was wrongly
      setting the max memblock limit to ppc64_rma_size with radix translation
      which resulted in boot failure as shown below.
      
      Kernel panic - not syncing:
      early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
       CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
       Call Trace:
       [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 (unreliable)
       [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
       [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
       [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
       [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
       [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170
      
      This was because the kernel was checking for the radix feature before we enable the
      feature via mmu_features. This resulted in the kernel using hash restrictions on
      radix.
      
      Rework the early init code such that the kernel boot with memblock restrictions
      as imposed by hash. At that point, the kernel still hasn't finalized the
      translation the kernel will end up using.
      
      We have three different ways of detecting radix.
      
      1. dt_cpu_ftrs_scan -> used only in case of PowerNV
      2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
      3. CAS -> Where we negotiate with hypervisor about the supported translation.
      
      We look at 1 or 2 early in the boot and after that, we look at the CAS vector to
      finalize the translation the kernel will use. We also support a kernel command
      line option (disable_radix) to switch to hash.
      
      Update the memblock limit after mmu_early_init_devtree() if the kernel is going
      to use radix translation. This forces some of the memblock allocations we do before
      mmu_early_init_devtree() to be within the RMA limit.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Reported-by: default avatarShirisha Ganta <shiganta@in.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200828100852.426575-1-aneesh.kumar@linux.ibm.com
      103a8542
    • Christophe Leroy's avatar
      powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU · 4a133eb3
      Christophe Leroy authored
      low_sleep_handler() can't restore the context from virtual
      stack because the stack can hardly be accessed with MMU OFF.
      
      For now, disable VMAP stack when CONFIG_ADB_PMU is selected.
      
      Fixes: cd08f109 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
      Cc: stable@vger.kernel.org # v5.6+
      Reported-by: default avatarGiuseppe Sacco <giuseppe@sguazz.it>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/ec96c15bfa1a7415ab604ee1c98cd45779c08be0.1598553015.git.christophe.leroy@csgroup.eu
      4a133eb3
  7. 27 Aug, 2020 7 commits
  8. 24 Aug, 2020 3 commits
  9. 21 Aug, 2020 2 commits
  10. 20 Aug, 2020 3 commits
  11. 18 Aug, 2020 3 commits
    • Michael Roth's avatar
      powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death · 801980f6
      Michael Roth authored
      For a power9 KVM guest with XIVE enabled, running a test loop
      where we hotplug 384 vcpus and then unplug them, the following traces
      can be seen (generally within a few loops) either from the unplugged
      vcpu:
      
        cpu 65 (hwid 65) Ready to die...
        Querying DEAD? cpu 66 (66) shows 2
        list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
        ------------[ cut here ]------------
        kernel BUG at lib/list_debug.c:56!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
        CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
        NIP:  c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
        REGS: c0000009e5a17840 TRAP: 0700   Not tainted  (4.18.0-221.el8.ppc64le)
        MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040000
        ...
        NIP __list_del_entry_valid+0xac/0x100
        LR  __list_del_entry_valid+0xa8/0x100
        Call Trace:
          __list_del_entry_valid+0xa8/0x100 (unreliable)
          free_pcppages_bulk+0x1f8/0x940
          free_unref_page+0xd0/0x100
          xive_spapr_cleanup_queue+0x148/0x1b0
          xive_teardown_cpu+0x1bc/0x240
          pseries_mach_cpu_die+0x78/0x2f0
          cpu_die+0x48/0x70
          arch_cpu_idle_dead+0x20/0x40
          do_idle+0x2f4/0x4c0
          cpu_startup_entry+0x38/0x40
          start_secondary+0x7bc/0x8f0
          start_secondary_prolog+0x10/0x14
      
      or on the worker thread handling the unplug:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
        BUG: Bad page state in process kworker/u768:3  pfn:95de1
        cpu 314 (hwid 314) Ready to die...
        page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
        flags: 0x5ffffc00000000()
        raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
        raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
        page dumped because: nonzero mapcount
        Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
        CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
        Workqueue: pseries hotplug workque pseries_hp_work_fn
        Call Trace:
          dump_stack+0xb0/0xf4 (unreliable)
          bad_page+0x12c/0x1b0
          free_pcppages_bulk+0x5bc/0x940
          page_alloc_cpu_dead+0x118/0x120
          cpuhp_invoke_callback.constprop.5+0xb8/0x760
          _cpu_down+0x188/0x340
          cpu_down+0x5c/0xa0
          cpu_subsys_offline+0x24/0x40
          device_offline+0xf0/0x130
          dlpar_offline_cpu+0x1c4/0x2a0
          dlpar_cpu_remove+0xb8/0x190
          dlpar_cpu_remove_by_index+0x12c/0x150
          dlpar_cpu+0x94/0x800
          pseries_hp_work_fn+0x128/0x1e0
          process_one_work+0x304/0x5d0
          worker_thread+0xcc/0x7a0
          kthread+0x1ac/0x1c0
          ret_from_kernel_thread+0x5c/0x80
      
      The latter trace is due to the following sequence:
      
        page_alloc_cpu_dead
          drain_pages
            drain_pages_zone
              free_pcppages_bulk
      
      where drain_pages() in this case is called under the assumption that
      the unplugged cpu is no longer executing. To ensure that is the case,
      and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
      loop that waits for the cpu to reach a halted state by polling its
      status via query-cpu-stopped-state RTAS calls. It only polls for 25
      iterations before giving up, however, and in the trace above this
      results in the following being printed only .1 seconds after the
      hotplug worker thread begins processing the unplug request:
      
        pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
        Querying DEAD? cpu 314 (314) shows 2
      
      At that point the worker thread assumes the unplugged CPU is in some
      unknown/dead state and procedes with the cleanup, causing the race
      with the XIVE cleanup code executed by the unplugged CPU.
      
      Fix this by waiting indefinitely, but also making an effort to avoid
      spurious lockup messages by allowing for rescheduling after polling
      the CPU status and printing a warning if we wait for longer than 120s.
      
      Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
      Tested-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarThiago Jung Bauermann <bauerman@linux.ibm.com>
      Reviewed-by: default avatarGreg Kurz <groug@kaod.org>
      [mpe: Trim oopses in change log slightly for readability]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
      801980f6
    • Christophe Leroy's avatar
      powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined · 7bee31ad
      Christophe Leroy authored
      When MODULES_VADDR is defined, is_module_segment() shall check the
      address against it instead of checking agains VMALLOC_START.
      
      Fixes: 6ca05532 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/07884ed033c31e074747b7eb8eaa329d15db07ec.1596641219.git.christophe.leroy@csgroup.eu
      7bee31ad
    • Christophe Leroy's avatar
      powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32 · 48d2f040
      Christophe Leroy authored
      On BOOK3S_32, when we have modules and strict kernel RWX, modules
      are not in vmalloc space but in a dedicated segment that is
      below PAGE_OFFSET.
      
      So KASAN_SHADOW_START must take it into account.
      
      MODULES_VADDR can't be used because it is not defined yet
      in kasan.h
      
      Fixes: 6ca05532 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/6eddca2d5611fd57312a88eae31278c87a8fc99d.1596641224.git.christophe.leroy@csgroup.eu
      48d2f040
  12. 17 Aug, 2020 7 commits
  13. 16 Aug, 2020 6 commits
    • Linus Torvalds's avatar
      Linux 5.9-rc1 · 9123e3a7
      Linus Torvalds authored
      9123e3a7
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block · 2cc3c4b3
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few differerent things in here.
      
        Seems like syzbot got some more io_uring bits wired up, and we got a
        handful of reports and the associated fixes are in here.
      
        General fixes too, and a lot of them marked for stable.
      
        Lastly, a bit of fallout from the async buffered reads, where we now
        more easily trigger short reads. Some applications don't really like
        that, so the io_read() code now handles short reads internally, and
        got a cleanup along the way so that it's now easier to read (and
        documented). We're now passing tests that failed before"
      
      * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
        io_uring: short circuit -EAGAIN for blocking read attempt
        io_uring: sanitize double poll handling
        io_uring: internally retry short reads
        io_uring: retain iov_iter state over io_read/io_write calls
        task_work: only grab task signal lock when needed
        io_uring: enable lookup of links holding inflight files
        io_uring: fail poll arm on queue proc failure
        io_uring: hold 'ctx' reference around task_work queue + execute
        fs: RWF_NOWAIT should imply IOCB_NOIO
        io_uring: defer file table grabbing request cleanup for locked requests
        io_uring: add missing REQ_F_COMP_LOCKED for nested requests
        io_uring: fix recursive completion locking on oveflow flush
        io_uring: use TWA_SIGNAL for task_work uncondtionally
        io_uring: account locked memory before potential error case
        io_uring: set ctx sq/cq entry count earlier
        io_uring: Fix NULL pointer dereference in loop_rw_iter()
        io_uring: add comments on how the async buffered read retry works
        io_uring: io_async_buf_func() need not test page bit
      2cc3c4b3
    • Mike Rapoport's avatar
      parisc: fix PMD pages allocation by restoring pmd_alloc_one() · 6f6aea7e
      Mike Rapoport authored
      Commit 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one()
      and pmd_free_one()") converted parisc to use generic version of
      pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for
      PMD.
      
      Restore the original version of pmd_alloc_one() for parisc, just use
      GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and
      memset.
      
      Fixes: 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.eeSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f6aea7e
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
  14. 15 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96 ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56