1. 21 Mar, 2018 3 commits
  2. 19 Mar, 2018 1 commit
  3. 18 Mar, 2018 5 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e1909b9
      Linus Torvalds authored
      Pull x86/pti updates from Thomas Gleixner:
       "Another set of melted spectrum updates:
      
         - Iron out the last late microcode loading issues by actually
           checking whether new microcode is present and preventing the CPU
           synchronization to run into a timeout induced hang.
      
         - Remove Skylake C2 from the microcode blacklist according to the
           latest Intel documentation
      
         - Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
           not. Enhance the selftests to catch that kind of issue
      
         - Annotate indirect calls/jumps for objtool on 32bit. This is not a
           functional issue, but for consistency sake its the right thing to
           do.
      
         - Fix a jump label build warning observed on SPARC64 which uses 32bit
           storage for the code location which is casted to 64 bit pointer w/o
           extending it to 64bit first.
      
         - Add two new cpufeature bits. Not really an urgent issue, but
           provides them for both x86 and x86/kvm work. No impact on the
           current kernel"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode: Fix CPU synchronization routine
        x86/microcode: Attempt late loading only when new microcode is present
        x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
        jump_label: Fix sparc64 warning
        x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
        x86/vm86/32: Fix POPF emulation
        selftests/x86/entry_from_vm86: Add test cases for POPF
        selftests/x86/entry_from_vm86: Exit with 1 if we fail
        x86/cpufeatures: Add Intel PCONFIG cpufeature
        x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
      9e1909b9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · df4fe178
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "A single fix for vmalloc_fault() which uses p*d_huge() unconditionally
        whether CONFIG_HUGETLBFS is set or not. In case of CONFIG_HUGETLBFS=n
        this results in a crash as p*d_huge() returns 0 in that case"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix vmalloc_fault to use pXd_large
      df4fe178
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d2149e13
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Three fixes for irq chip drivers:
      
         - Make sure the allocations in the GIC-V3 ITS driver are large enough
           to accomodate the interrupt space
      
         - Fix a misplaced __iomem annotation which causes a splat of 26
           sparse warnings
      
         - Remove an unused function in the IMX GPCV2 driver which causes
           build warnings"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/irq-imx-gpcv2: Remove unused function
        irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
        irqchip/gic-v3-its: Fix misplaced __iomem annotations
      d2149e13
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 23fe85ae
      Linus Torvalds authored
      Pull EFI fix from Thomas Gleixner:
       "A single fix to prevent partially initialized pointers in mixed mode
        (64bit kernel on 32bit UEFI)"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/libstub/tpm: Initialize pointer variables to zero for mixed mode
      23fe85ae
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3cd1d327
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "PPC:
         - fix bug leading to lost IPIs and smp_call_function_many() lockups
           on POWER9
      
        ARM:
         - locking fix
         - reset fix
         - GICv2 multi-source SGI injection fix
         - GICv2-on-v3 MMIO synchronization fix
         - make the console less verbose.
      
        x86:
         - fix device passthrough on AMD SME"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Fix device passthrough when SME is active
        kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3
        KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid
        KVM: arm/arm64: Reduce verbosity of KVM init log
        KVM: arm/arm64: Reset mapped IRQs on VM reset
        KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
        KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending
        KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
      3cd1d327
  4. 17 Mar, 2018 1 commit
    • John David Anglin's avatar
      parisc: Handle case where flush_cache_range is called with no context · 9ef0f88f
      John David Anglin authored
      Just when I had decided that flush_cache_range() was always called with
      a valid context, Helge reported two cases where the
      "BUG_ON(!vma->vm_mm->context);" was hit on the phantom buildd:
      
       kernel BUG at /mnt/sdb6/linux/linux-4.15.4/arch/parisc/kernel/cache.c:587!
       CPU: 1 PID: 3254 Comm: kworker/1:2 Tainted: G D 4.15.0-1-parisc64-smp #1 Debian 4.15.4-1+b1
       Workqueue: events free_ioctx
        IAOQ[0]: flush_cache_range+0x164/0x168
        IAOQ[1]: flush_cache_page+0x0/0x1c8
        RP(r2): unmap_page_range+0xae8/0xb88
       Backtrace:
        [<00000000404a6980>] unmap_page_range+0xae8/0xb88
        [<00000000404a6ae0>] unmap_single_vma+0xc0/0x188
        [<00000000404a6cdc>] zap_page_range_single+0x134/0x1f8
        [<00000000404a702c>] unmap_mapping_range+0x1cc/0x208
        [<0000000040461518>] truncate_pagecache+0x98/0x108
        [<0000000040461624>] truncate_setsize+0x9c/0xb8
        [<00000000405d7f30>] put_aio_ring_file+0x80/0x100
        [<00000000405d803c>] aio_free_ring+0x8c/0x290
        [<00000000405d82c0>] free_ioctx+0x80/0x180
        [<0000000040284e6c>] process_one_work+0x21c/0x668
        [<00000000402854c4>] worker_thread+0x20c/0x778
        [<0000000040291d44>] kthread+0x2d4/0x2e0
        [<0000000040204020>] end_fault_vector+0x20/0xc0
      
      This indicates that we need to handle the no context case in
      flush_cache_range() as we do in flush_cache_mm().
      
      In thinking about this, I realized that we don't need to flush the TLB
      when there is no context.  So, I added context checks to the large flush
      cases in flush_cache_mm() and flush_cache_range().  The large flush case
      occurs frequently in flush_cache_mm() and the change should improve fork
      performance.
      
      The v2 version of this change removes the BUG_ON from flush_cache_page()
      by skipping the TLB flush when there is no context.  I also added code
      to flush the TLB in flush_cache_mm() and flush_cache_range() when we
      have a context that's not current.  Now all three routines handle TLB
      flushes in a similar manner.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9ef0f88f
  5. 16 Mar, 2018 17 commits
  6. 15 Mar, 2018 10 commits
    • Eric W. Biederman's avatar
      fs: Teach path_connected to handle nfs filesystems with multiple roots. · 95dd7758
      Eric W. Biederman authored
      On nfsv2 and nfsv3 the nfs server can export subsets of the same
      filesystem and report the same filesystem identifier, so that the nfs
      client can know they are the same filesystem.  The subsets can be from
      disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
      way to find the common root of all directory trees exported form the
      server with the same filesystem identifier.
      
      The practical result is that in struct super s_root for nfs s_root is
      not necessarily the root of the filesystem.  The nfs mount code sets
      s_root to the root of the first subset of the nfs filesystem that the
      kernel mounts.
      
      This effects the dcache invalidation code in generic_shutdown_super
      currently called shrunk_dcache_for_umount and that code for years
      has gone through an additional list of dentries that might be dentry
      trees that need to be freed to accomodate nfs.
      
      When I wrote path_connected I did not realize nfs was so special, and
      it's hueristic for avoiding calling is_subdir can fail.
      
      The practical case where this fails is when there is a move of a
      directory from the subtree exposed by one nfs mount to the subtree
      exposed by another nfs mount.  This move can happen either locally or
      remotely.  With the remote case requiring that the move directory be cached
      before the move and that after the move someone walks the path
      to where the move directory now exists and in so doing causes the
      already cached directory to be moved in the dcache through the magic
      of d_splice_alias.
      
      If someone whose working directory is in the move directory or a
      subdirectory and now starts calling .. from the initial mount of nfs
      (where s_root == mnt_root), then path_connected as a heuristic will
      not bother with the is_subdir check.  As s_root really is not the root
      of the nfs filesystem this heuristic is wrong, and the path may
      actually not be connected and path_connected can fail.
      
      The is_subdir function might be cheap enough that we can call it
      unconditionally.  Verifying that will take some benchmarking and
      the result may not be the same on all kernels this fix needs
      to be backported to.  So I am avoiding that for now.
      
      Filesystems with snapshots such as nilfs and btrfs do something
      similar.  But as the directory tree of the snapshots are disjoint
      from one another and from the main directory tree rename won't move
      things between them and this problem will not occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 397d425d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      95dd7758
    • Rodrigo Vivi's avatar
      Merge tag 'gvt-fixes-2018-03-15' of https://github.com/intel/gvt-linux into drm-intel-fixes · 05b429a8
      Rodrigo Vivi authored
      gvt-fixes-2018-03-15
      
      - Two warnings fix for runtime pm and usr copy (Xiong, Zhenyu)
      - OA context fix for vGPU profiling (Min)
      - privilege batch buffer reloc fix (Fred)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180315100023.5n5a74afky6qinoh@zhen-hp.sh.intel.com
      05b429a8
    • David S. Miller's avatar
      sparc64: Fix regression in pmdp_invalidate(). · cfb61b5e
      David S. Miller authored
      pmdp_invalidate() was changed to update the pmd atomically
      (to not lose dirty/access bits) and return the original pmd
      value.
      
      However, in doing so, we lost a lot of the essential work that
      set_pmd_at() does, namely to update hugepage mapping counts and
      queuing up the batched TLB flush entry.
      
      Thus we were not flushing entries out of the TLB when making
      such PMD changes.
      
      Fix this by abstracting the accounting work of set_pmd_at() out into a
      separate function, and call it from pmdp_establish().
      
      Fixes: a8e654f0 ("sparc64: update pmdp_invalidate() to return old pmd value")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfb61b5e
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-4.16-2' of... · 52be7a46
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
      
      Fix for PPC KVM for 4.16
      
      - Fix bug leading to lost IPIs on POWER9 and hence to other CPUs reporting
        lockups in smp_call_function_many().
      52be7a46
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-fixes-for-v4.16-2' of... · bb9b4dbe
      Paolo Bonzini authored
      Merge tag 'kvm-arm-fixes-for-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      kvm/arm fixes for 4.16, take 2
      
      - Peace of mind locking fix in vgic_mmio_read_pending
      - Allow hw-mapped interrupts to be reset when the VM resets
      - Fix GICv2 multi-source SGI injection
      - Fix MMIO synchronization for GICv2 on v3 emulation
      - Remove excess verbosity on the console
      bb9b4dbe
    • Linus Torvalds's avatar
      Merge tag 'sound-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e2c15aff
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A series of small fixes in ASoC, HD-audio and core stuff:
      
         - a UAF fix in ALSA PCM core
      
         - yet more hardening for ALSA sequencer
      
         - a regression fix for the previous HD-audio power_save option change
      
         - various ASoC codec fixes (sgtl5000, rt5651, hdmi-codec, wm_adsp)
      
         - minor ASoC platform fixes (AMD ACP, sun4i)"
      
      * tag 'sound-4.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Revert power_save option default value
        ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats()
        ALSA: seq: Clear client entry before deleting else at closing
        ALSA: seq: Fix possible UAF in snd_seq_check_queue()
        ASoC: amd: 16bit resolution support for i2s sp instance
        ASoC: wm_adsp: For TLV controls only register TLV get/set
        ASoC: sun4i-i2s: Fix RX slot number of SUN8I
        ASoC: hdmi-codec: Fix module unloading caused kernel crash
        ASoC: rt5651: Fix regcache sync errors on resume
        ASoC: sgtl5000: Fix suspend/resume
        MAINTAINERS: Add myself as sgtl5000 maintainer
        ASoC: samsung: Add the DT binding files entry to MAINTAINERS
        sgtl5000: change digital_mute policy
      e2c15aff
    • Linus Torvalds's avatar
      Merge tag 'for-4.16/dm-fixes-3' of... · 667058ae
      Linus Torvalds authored
      Merge tag 'for-4.16/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - a stable DM multipath fix to restore ability to pass integrity data
      
       - two DM multipath fixes for a fix that was merged into 4.16-rc5
      
      * tag 'for-4.16/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm mpath: fix passing integrity data
        dm mpath: eliminate need to use scsi_device_from_queue
        dm mpath: fix uninitialized 'pg_init_wait' waitqueue_head NULL pointer
      667058ae
    • Zhenyu Wang's avatar
      drm/i915/gvt: fix user copy warning by whitelist workload rb_tail field · 850555d1
      Zhenyu Wang authored
      This is to fix warning got as:
      
      [ 6730.476938] ------------[ cut here ]------------
      [ 6730.476979] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'gvt-g_vgpu_workload' (offset 120, size 4)!
      [ 6730.477021] WARNING: CPU: 2 PID: 441 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
      [ 6730.477042] Modules linked in: tun(E) bridge(E) stp(E) llc(E) kvmgt(E) x86_pkg_temp_thermal(E) vfio_mdev(E) intel_powerclamp(E) mdev(E) coretemp(E) vfio_iommu_type1(E) vfio(E) kvm_intel(E) kvm(E) hid_generic(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) usbhid(E) i915(E) crc32c_intel(E) hid(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) intel_cstate(E) idma64(E) evdev(E) virt_dma(E) iTCO_wdt(E) intel_uncore(E) intel_rapl_perf(E) intel_lpss_pci(E) sg(E) shpchp(E) mei_me(E) pcspkr(E) iTCO_vendor_support(E) intel_lpss(E) intel_pch_thermal(E) prime_numbers(E) mei(E) mfd_core(E) video(E) acpi_pad(E) button(E) binfmt_misc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) e1000e(E) xhci_pci(E) sdhci_pci(E)
      [ 6730.477244]  ptp(E) cqhci(E) xhci_hcd(E) pps_core(E) sdhci(E) mmc_core(E) i2c_i801(E) usbcore(E) thermal(E) fan(E)
      [ 6730.477276] CPU: 2 PID: 441 Comm: gvt workload 0 Tainted: G            E    4.16.0-rc1-gvt-staging-0213+ #127
      [ 6730.477303] Hardware name:  /NUC6i5SYB, BIOS SYSKLi35.86A.0039.2016.0316.1747 03/16/2016
      [ 6730.477326] RIP: 0010:usercopy_warn+0x7e/0xa0
      [ 6730.477340] RSP: 0018:ffffba6301223d18 EFLAGS: 00010286
      [ 6730.477355] RAX: 0000000000000000 RBX: ffff8f41caae9838 RCX: 0000000000000006
      [ 6730.477375] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f41dad166f0
      [ 6730.477395] RBP: 0000000000000004 R08: 0000000000000576 R09: 0000000000000000
      [ 6730.477415] R10: ffffffffb1293fb2 R11: 00000000ffffffff R12: 0000000000000001
      [ 6730.477447] R13: ffff8f41caae983c R14: ffff8f41caae9838 R15: 00007f183ca2b000
      [ 6730.477467] FS:  0000000000000000(0000) GS:ffff8f41dad00000(0000) knlGS:0000000000000000
      [ 6730.477489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6730.477506] CR2: 0000559462817291 CR3: 000000028b46c006 CR4: 00000000003626e0
      [ 6730.477526] Call Trace:
      [ 6730.477537]  __check_object_size+0x9c/0x1a0
      [ 6730.477562]  __kvm_write_guest_page+0x45/0x90 [kvm]
      [ 6730.477585]  kvm_write_guest+0x46/0x80 [kvm]
      [ 6730.477599]  kvmgt_rw_gpa+0x9b/0xf0 [kvmgt]
      [ 6730.477642]  workload_thread+0xa38/0x1040 [i915]
      [ 6730.477659]  ? do_wait_intr_irq+0xc0/0xc0
      [ 6730.477673]  ? finish_wait+0x80/0x80
      [ 6730.477707]  ? clean_workloads+0x120/0x120 [i915]
      [ 6730.477722]  kthread+0x111/0x130
      [ 6730.477733]  ? _kthread_create_worker_on_cpu+0x60/0x60
      [ 6730.477750]  ? exit_to_usermode_loop+0x6f/0xb0
      [ 6730.477766]  ret_from_fork+0x35/0x40
      [ 6730.477777] Code: 48 c7 c0 20 e3 25 b1 48 0f 44 c2 41 50 51 41 51 48 89 f9 49 89 f1 4d 89 d8 4c 89 d2 48 89 c6 48 c7 c7 78 e3 25 b1 e8 b2 bc e4 ff <0f> ff 48 83 c4 18 c3 48 c7 c6 09 d0 26 b1 49 89 f1 49 89 f3 eb
      [ 6730.477849] ---[ end trace cae869c1c323e45a ]---
      
      By whitelist guest page write from workload struct allocated from kmem cache.
      Reviewed-by: default avatarHang Yuan <hang.yuan@linux.intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      (cherry picked from commit 5627705406874df57fdfad3b4e0c9aedd3b007df)
      850555d1
    • fred gao's avatar
      drm/i915/gvt: Correct the privilege shadow batch buffer address · ef75c685
      fred gao authored
      Once the ring buffer is copied to ring_scan_buffer and scanned,
      the shadow batch buffer start address is only updated into
      ring_scan_buffer, not the real ring address allocated through
      intel_ring_begin in later copy_workload_to_ring_buffer.
      
      This patch is only to set the right shadow batch buffer address
      from Ring buffer, not include the shadow_wa_ctx.
      
      v2:
      - refine some comments. (Zhenyu)
      v3:
      - fix typo in title. (Zhenyu)
      v4:
      - remove the unnecessary comments. (Zhenyu)
      - add comments in bb_start_cmd_va update. (Zhenyu)
      
      Fixes: 0a53bc07 ("drm/i915/gvt: Separate cmd scan from request allocation")
      Cc: stable@vger.kernel.org  # v4.15
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: Yulei Zhang <yulei.zhang@intel.com>
      Signed-off-by: default avatarfred gao <fred.gao@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      ef75c685
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 0aa3fdb8
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is four patches, consisting of one regression from the merge
        window (qla2xxx), one long-standing memory leak (sd_zbc), one event
        queue mislabelling which we want to eliminate to discourage the
        pattern (mpt3sas), and one behaviour change because re-reading the
        partition table shouldn't clear the ro flag"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Keep disk read-only when re-reading partition
        scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure
        scsi: sd_zbc: Fix potential memory leak
        scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
      0aa3fdb8
  7. 14 Mar, 2018 3 commits
    • Joern Engel's avatar
      btree: avoid variable-length allocations · 8df3aaaf
      Joern Engel authored
      geo->keylen cannot be larger than 4.  So we might as well make
      fixed-size allocations.
      
      Given the one remaining user, geo->keylen cannot even be larger than 1.
      Logfs used to have 64bit and 128bit keys, tcm_qla2xxx only has 32bit
      keys.  But let's not break the code if we don't have to.
      Signed-off-by: default avatarJoern Engel <joern@purestorage.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8df3aaaf
    • Linus Torvalds's avatar
      Merge branch 'percpu_ref-rcu-audit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc · fed8f509
      Linus Torvalds authored
      Pull percpu_ref rcu fixes from Tejun Heo:
       "Jann Horn found that aio was depending on the internal RCU grace
        periods of percpu-ref and that it's broken because aio uses regular
        RCU while percpu_ref uses sched-RCU.
      
        Depending on percpu_ref's internal grace periods isn't a good idea
        because
      
         - The RCU type might not match.
      
         - percpu_ref's grace periods are used to switch to atomic mode. They
           aren't between the last put and the invocation of the last release.
           This is easy to get confused about and can lead to subtle bugs.
      
         - percpu_ref might not have grace periods at all depending on its
           current operation mode.
      
        This patchset audits and fixes percpu_ref users for their RCU usages"
      
      [ There's a continuation of this series that clarifies percpu_ref
        documentation that the internal grace periods must not be depended
        upon, and introduces rcu_work to simplify bouncing to a workqueue
        after an RCU grace period.
      
        That will go in for 4.17 - this is just the minimal set with the fixes
        that are tagged for -stable ]
      
      * 'percpu_ref-rcu-audit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
        RDMAVT: Fix synchronization around percpu_ref
        fs/aio: Use RCU accessors for kioctx_table->table[]
        fs/aio: Add explicit RCU grace period when freeing kioctx
      fed8f509
    • Ard Biesheuvel's avatar
      Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" · 3e04040d
      Ard Biesheuvel authored
      This reverts commit 864b75f9.
      
      Commit 864b75f9 ("mm/page_alloc: fix memmap_init_zone pageblock
      alignment") modified the logic in memmap_init_zone() to initialize
      struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
      in move_freepages(), which is redundant by its own admission, and
      dereferences struct page fields to obtain the zone without checking
      whether the struct pages in question are valid to begin with.
      
      Commit 864b75f9 only makes it worse, since the rounding it does
      may cause pfn assume the same value it had in a prior iteration of
      the loop, resulting in an infinite loop and a hang very early in the
      boot. Also, since it doesn't perform the same rounding on start_pfn
      itself but only on intermediate values following an invalid PFN, we
      may still hit the same VM_BUG_ON() as before.
      
      So instead, let's fix this at the core, and ensure that the BUG
      check doesn't dereference struct page fields of invalid pages.
      
      Fixes: 864b75f9 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
      Tested-by: default avatarJan Glauber <jglauber@cavium.com>
      Tested-by: default avatarShanker Donthineni <shankerd@codeaurora.org>
      Cc: Daniel Vacek <neelx@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e04040d