1. 08 Jul, 2020 3 commits
    • Peter Zijlstra's avatar
      Merge branch 'sched/urgent' · faa2fd7c
      Peter Zijlstra authored
      faa2fd7c
    • Mathieu Desnoyers's avatar
      sched: Fix unreliable rseq cpu_id for new tasks · ce3614da
      Mathieu Desnoyers authored
      While integrating rseq into glibc and replacing glibc's sched_getcpu
      implementation with rseq, glibc's tests discovered an issue with
      incorrect __rseq_abi.cpu_id field value right after the first time
      a newly created process issues sched_setaffinity.
      
      For the records, it triggers after building glibc and running tests, and
      then issuing:
      
        for x in {1..2000} ; do posix/tst-affinity-static  & done
      
      and shows up as:
      
      error: Unexpected CPU 2, expected 0
      error: Unexpected CPU 2, expected 0
      error: Unexpected CPU 2, expected 0
      error: Unexpected CPU 2, expected 0
      error: Unexpected CPU 138, expected 0
      error: Unexpected CPU 138, expected 0
      error: Unexpected CPU 138, expected 0
      error: Unexpected CPU 138, expected 0
      
      This is caused by the scheduler invoking __set_task_cpu() directly from
      sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
      is done by set_task_cpu().
      
      Add the missing rseq_migrate() to both functions. The only other direct
      use of __set_task_cpu() is done by init_idle(), which does not involve a
      user-space task.
      
      Based on my testing with the glibc test-case, just adding rseq_migrate()
      to wake_up_new_task() is sufficient to fix the observed issue. Also add
      it to sched_fork() to keep things consistent.
      
      The reason why this never triggered so far with the rseq/basic_test
      selftest is unclear.
      
      The current use of sched_getcpu(3) does not typically require it to be
      always accurate. However, use of the __rseq_abi.cpu_id field within rseq
      critical sections requires it to be accurate. If it is not accurate, it
      can cause corruption in the per-cpu data targeted by rseq critical
      sections in user-space.
      Reported-By: default avatarFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-By: default avatarFlorian Weimer <fweimer@redhat.com>
      Cc: stable@vger.kernel.org # v4.18+
      Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com
      ce3614da
    • Peter Zijlstra's avatar
      sched: Fix loadavg accounting race · dbfb089d
      Peter Zijlstra authored
      The recent commit:
      
        c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu")
      
      moved these lines in ttwu():
      
      	p->sched_contributes_to_load = !!task_contributes_to_load(p);
      	p->state = TASK_WAKING;
      
      up before:
      
      	smp_cond_load_acquire(&p->on_cpu, !VAL);
      
      into the 'p->on_rq == 0' block, with the thinking that once we hit
      schedule() the current task cannot change it's ->state anymore. And
      while this is true, it is both incorrect and flawed.
      
      It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
      to avoid weak hardware from re-ordering things for us. This can fairly
      easily be achieved by relying on the control-dependency already in
      place.
      
      The second problem, which makes the flaw in the original argument, is
      that while schedule() will not change prev->state, it will read it a
      number of times (arguably too many times since it's marked volatile).
      The previous condition 'p->on_cpu == 0' was sufficient because that
      indicates schedule() has completed, and will no longer read
      prev->state. So now the trick is to make this same true for the (much)
      earlier 'prev->on_rq == 0' case.
      
      Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
      assignment needs to he a RELEASE, but adding additional ordering to
      schedule() is an unwelcome proposition at the best of times, doubly so
      for mere accounting.
      
      Luckily we can push the prev->state load up before rq->lock, with the
      only caveat that we then have to re-read the state after. However, we
      know that if it changed, we no longer have to worry about the blocking
      path. This gives us the required ordering, if we block, we did the
      prev->state load before an (effective) smp_mb() and the p->on_rq store
      needs not change.
      
      With this we end up with the effective ordering:
      
      	LOAD p->state           LOAD-ACQUIRE p->on_rq == 0
      	MB
      	STORE p->on_rq, 0       STORE p->state, TASK_WAKING
      
      which ensures the TASK_WAKING store happens after the prev->state
      load, and all is well again.
      
      Fixes: c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu")
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Tested-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
      dbfb089d
  2. 05 Jul, 2020 14 commits
    • Linus Torvalds's avatar
      Linux 5.8-rc4 · dcb7fd82
      Linus Torvalds authored
      dcb7fd82
    • Linus Torvalds's avatar
      x86/ldt: use "pr_info_once()" instead of open-coding it badly · bb5a93aa
      Linus Torvalds authored
      Using a mutex for "print this warning only once" is so overdesigned as
      to be actively offensive to my sensitive stomach.
      
      Just use "pr_info_once()" that already does this, although in a
      (harmlessly) racy manner that can in theory cause the message to be
      printed twice if more than one CPU races on that "is this the first
      time" test.
      
      [ If somebody really cares about that harmless data race (which sounds
        very unlikely indeed), that person can trivially fix printk_once() by
        using a simple atomic access, preferably with an optimistic non-atomic
        test first before even bothering to treat the pointless "make sure it
        is _really_ just once" case.
      
        A mutex is most definitely never the right primitive to use for
        something like this. ]
      
      Yes, this is a small and meaningless detail in a code path that hardly
      matters.  But let's keep some code quality standards here, and not
      accept outrageously bad code.
      
      Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb5a93aa
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72674d48
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A series of fixes for x86:
      
         - Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
           space value.
      
         - Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
           whitelisted for split lock detection. Some CPUs which do not
           support it crash even when the MSR is written to 0 which is the
           default value.
      
         - Fix the XEN PV fallout of the entry code rework
      
         - Fix the 32bit fallout of the entry code rework
      
         - Add more selftests to ensure that these entry problems don't come
           back.
      
         - Disable 16 bit segments on XEN PV. It's not supported because XEN
           PV does not implement ESPFIX64"
      
      * tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ldt: Disable 16-bit segments on Xen PV
        x86/entry/32: Fix #MC and #DB wiring on x86_32
        x86/entry/xen: Route #DB correctly on Xen PV
        x86/entry, selftests: Further improve user entry sanity checks
        x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
        selftests/x86: Consolidate and fix get/set_eflags() helpers
        selftests/x86/syscall_nt: Clear weird flags after each test
        selftests/x86/syscall_nt: Add more flag combinations
        x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
        x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
        x86/entry: Assert that syscalls are on the right stack
        x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
        x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
      72674d48
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f23dbe18
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of interrupt chip driver fixes:
      
         - Ensure the atomicity of affinity updates in the GIC driver
      
         - Don't try to sleep in atomic context when waiting for the GICv4.1
           to respond. Use polling instead.
      
         - Typo fixes in Kconfig and warnings"
      
      * tag 'irq-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic: Atomically update affinity
        irqchip/riscv-intc: Fix a typo in a pr_warn()
        irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic
        irqchip/loongson-pci-msi: Fix a typo in Kconfig
      f23dbe18
    • Linus Torvalds's avatar
      Merge tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5465a324
      Linus Torvalds authored
      Pull rcu fixlet from Thomas Gleixner:
       "A single fix for a printk format warning in RCU"
      
      * tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rcuperf: Fix printk format warning
      5465a324
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.8-2' of... · 4bc92736
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes frin Masahiro Yamada:
      
       - fix various bugs in xconfig
      
       - fix some issues in cross-compilation using Clang
      
       - fix documentation
      
      * tag 'kbuild-fixes-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        .gitignore: Do not track `defconfig` from `make savedefconfig`
        kbuild: make Clang build userprogs for target architecture
        kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang
        kconfig: qconf: parse newer types at debug info
        kconfig: qconf: navigate menus on hyperlinks
        kconfig: qconf: don't show goback button on splitMode
        kconfig: qconf: simplify the goBack() logic
        kconfig: qconf: re-implement setSelected()
        kconfig: qconf: make debug links work again
        kconfig: qconf: make search fully work again on split mode
        kconfig: qconf: cleanup includes
        docs: kbuild: fix ReST formatting
        gcc-plugins: fix gcc-plugins directory path in documentation
      4bc92736
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 19a61a75
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Four small fixes in three drivers.
      
        The mptfusion one has actually caused user visible issues in certain
        kernel configurations"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: mptfusion: Don't use GFP_ATOMIC for larger DMA allocations
        scsi: libfc: Skip additional kref updating work event
        scsi: libfc: Handling of extra kref
        scsi: qla2xxx: Fix a condition in qla2x00_find_all_fabric_devs()
      19a61a75
    • Linus Torvalds's avatar
      Merge tag 'block-5.8-2020-07-05' of git://git.kernel.dk/linux-block · 29206c63
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe fixes from Christoph:
          - Fix crash in multi-path disk add (Christoph)
          - Fix ignore of identify error (Sagi)
      
       - Fix a compiler complaint that a function should be static (Wei)
      
      * tag 'block-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
        block: make function __bio_integrity_free() static
        nvme: fix a crash in nvme_mpath_add_disk
        nvme: fix identify error status silent ignore
      29206c63
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block · 9fbe565c
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Andres reported a regression with the fix that was merged earlier this
        week, where his setup of using signals to interrupt io_uring CQ waits
        no longer worked correctly.
      
        Fix this, and also limit our use of TWA_SIGNAL to the case where we
        need it, and continue using TWA_RESUME for task_work as before.
      
        Since the original is marked for 5.7 stable, let's flush this one out
        early"
      
      * tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
        io_uring: fix regression with always ignoring signals in io_cqring_wait()
      9fbe565c
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 77834854
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "The usual driver fixes and documentation updates"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mlxcpld: check correct size of maximum RECV_LEN packet
        i2c: add Kconfig help text for slave mode
        i2c: slave-eeprom: update documentation
        i2c: eg20t: Load module automatically if ID matches
        i2c: designware: platdrv: Set class based on DMI
        i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
      77834854
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 45a5ac7a
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - fix for missing hazard barrier
      
       - DT fix for ingenic
      
       - DT fix of GPHY names for lantiq
      
       - fix usage of smp_processor_id() while preemption is enabled
      
      * tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: Do not use smp_processor_id() in preemptible code
        MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
        MIPS: ingenic: gcw0: Fix HP detection GPIO.
        MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
      45a5ac7a
    • Xingxing Su's avatar
      MIPS: Do not use smp_processor_id() in preemptible code · 5868347a
      Xingxing Su authored
      Use preempt_disable() to fix the following bug under CONFIG_DEBUG_PREEMPT.
      
      [   21.915305] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1056
      [   21.923996] caller is do_ri+0x1d4/0x690
      [   21.927921] CPU: 0 PID: 1056 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
      [   21.934913] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
      [   21.942984]         a80f926d5ac95694 0000000000000000 98000007f0043c88 ffffffff80f2fe40
      [   21.951054]         0000000000000000 0000000000000000 0000000000000001 0000000000000000
      [   21.959123]         ffffffff802d60cc 98000007f0043dd8 ffffffff81f4b1e8 ffffffff81f60000
      [   21.967192]         ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
      [   21.975261]         fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
      [   21.983331]         ffffffff80fe1a40 0000000000000006 ffffffff8077f940 0000000000000000
      [   21.991401]         ffffffff81460000 98000007f0040000 98000007f0043c80 000000fffba8cf20
      [   21.999471]         ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
      [   22.007541]         0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
      [   22.015610]         ...
      [   22.018086] Call Trace:
      [   22.020562] [<ffffffff80212ab4>] show_stack+0xa4/0x138
      [   22.025732] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
      [   22.030903] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
      [   22.037375] [<ffffffff80213b84>] do_ri+0x1d4/0x690
      [   22.042198] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
      [   24.359386] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1072
      [   24.368204] caller is do_ri+0x1a8/0x690
      [   24.372169] CPU: 4 PID: 1072 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
      [   24.379170] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
      [   24.387246]         a80f926d5ac95694 0000000000000000 98001007ef06bc88 ffffffff80f2fe40
      [   24.395318]         0000000000000000 0000000000000000 0000000000000001 0000000000000000
      [   24.403389]         ffffffff802d60cc 98001007ef06bdd8 ffffffff81f4b818 ffffffff81f60000
      [   24.411461]         ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
      [   24.419533]         fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
      [   24.427603]         ffffffff80fe0000 0000000000000006 ffffffff8077f940 0000000000000020
      [   24.435673]         ffffffff81460020 98001007ef068000 98001007ef06bc80 000000fffbbbb370
      [   24.443745]         ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
      [   24.451816]         0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
      [   24.459887]         ...
      [   24.462367] Call Trace:
      [   24.464846] [<ffffffff80212ab4>] show_stack+0xa4/0x138
      [   24.470029] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
      [   24.475208] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
      [   24.481682] [<ffffffff80213b58>] do_ri+0x1a8/0x690
      [   24.486509] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
      Signed-off-by: default avatarXingxing Su <suxingxing@loongson.cn>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      5868347a
    • Hauke Mehrtens's avatar
      MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen · fcec538e
      Hauke Mehrtens authored
      This resolves the hazard between the mtc0 in the change_c0_status() and
      the mfc0 in configure_exception_vector(). Without resolving this hazard
      configure_exception_vector() could read an old value and would restore
      this old value again. This would revert the changes change_c0_status()
      did. I checked this by printing out the read_c0_status() at the end of
      per_cpu_trap_init() and the ST0_MX is not set without this patch.
      
      The hazard is documented in the MIPS Architecture Reference Manual Vol.
      III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
      6.03 table 8.1 which includes:
      
         Producer | Consumer | Hazard
        ----------|----------|----------------------------
         mtc0     | mfc0     | any coprocessor 0 register
      
      I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
      There the change_c0_status() function would activate the DSPen by
      setting ST0_MX in the c0_status register. This was reverted and then the
      system got a DSP exception when the DSP registers were saved in
      save_dsp() in the first process switch. The crash looks like this:
      
      [    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
      [    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
      [    0.107070] Kernel panic - not syncing: Unexpected DSP exception
      [    0.113470] Rebooting in 1 seconds..
      
      We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
      not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
      kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
      did not happen.
      
      In the kernel I bisected this problem to commit 9012d011 ("compiler:
      allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
      reverted it also happened after commit 172dcd93 ("MIPS: Always
      allocate exception vector for MIPSr2+").
      
      Commit 0b24cae4 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
      does similar changes to a different file. I am not sure if there are
      more places affected by this problem.
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      fcec538e
    • Paul Menzel's avatar
      .gitignore: Do not track `defconfig` from `make savedefconfig` · ba77dca5
      Paul Menzel authored
      Running `make savedefconfig` creates by default `defconfig`, which is,
      currently, on git’s radar, for example, `git status` lists this file as
      untracked.
      
      So, add the file to `.gitignore`, so it’s ignored by git.
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      ba77dca5
  3. 04 Jul, 2020 19 commits
  4. 03 Jul, 2020 4 commits
    • Joel Savitz's avatar
      mm/page_alloc: fix documentation error · 8beeae86
      Joel Savitz authored
      When I increased the upper bound of the min_free_kbytes value in
      ee8eb9a5 ("mm/page_alloc: increase default min_free_kbytes bound") I
      forgot to tweak the above comment to reflect the new value.  This patch
      fixes that mistake.
      Signed-off-by: default avatarJoel Savitz <jsavitz@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Fabrizio D'Angelo <fdangelo@redhat.com>
      Link: http://lkml.kernel.org/r/20200624221236.29560-1-jsavitz@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8beeae86
    • Christoph Hellwig's avatar
      vmalloc: fix the owner argument for the new __vmalloc_node_range callers · a3a66c38
      Christoph Hellwig authored
      Fix the recently added new __vmalloc_node_range callers to pass the
      correct values as the owner for display in /proc/vmallocinfo.
      
      Fixes: 800e26b8 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
      Fixes: 10d5e97c ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
      Fixes: 7a0e27b2 ("mm: remove vmalloc_exec")
      Reported-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.deSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3a66c38
    • Barry Song's avatar
      mm/cma.c: use exact_nid true to fix possible per-numa cma leak · 40366bd7
      Barry Song authored
      Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
      reservation can easily cause cma leak and various confusion.  For example,
      mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages.  But it
      can easily leak cma and make users confused when system has memoryless
      nodes.
      
      In case the system has 4 numa nodes, and only numa node0 has memory.  if
      we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
      different numa nodes.  since exact_nid=false in current code, all 4 numa
      nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
      never be available to hugepage will only allocate memory from
      hugetlb_cma[0].
      
      In case the system has 4 numa nodes, both numa node0&2 has memory, other
      nodes have no memory.  if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
      will get 4 cma areas for 4 different numa nodes.  since exact_nid=false in
      current code, all 4 numa nodes will get cma successfully from node0 or 2,
      but hugetlb_cma[1] and [3] will never be available to hugepage as
      mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
      hugetlb_cma[2].  This causes permanent leak of the cma areas which are
      supposed to be used by memoryless node.
      
      Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
      areas in alloc_gigantic_page() even node_mask includes node0 only.  that
      means when node_mask includes node0 only, we can get page from
      hugetlb_cma[1] to hugetlb_cma[3].  But this will cause kernel crash in
      free_gigantic_page() while it wants to free page by:
      cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
      
      On the other hand, exact_nid=false won't consider numa distance, it might
      be not that useful to leverage cma areas on remote nodes.  I feel it is
      much simpler to make exact_nid true to make everything clear.  After that,
      memoryless nodes won't be able to reserve per-numa CMA from other nodes
      which have memory.
      
      Fixes: cf11e85f ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Aslan Bakirov <aslan@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40366bd7
    • Kees Cook's avatar
      samples/vfs: avoid warning in statx override · c3eeaae9
      Kees Cook authored
      Something changed recently to uncover this warning:
      
        samples/vfs/test-statx.c:24:15: warning: `struct foo' declared inside parameter list will not be visible outside of this definition or declaration
           24 | #define statx foo
              |               ^~~
      
      Which is due the use of "struct statx" (here, "struct foo") in a function
      prototype argument list before it has been defined:
      
       int
       # 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
          foo
       # 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
                (int __dirfd, const char *__restrict __path, int __flags,
                  unsigned int __mask, struct
       # 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
                                             foo
       # 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
                                                   *__restrict __buf)
         __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (2, 5)));
      
      Add explicit struct before #include to avoid warning.
      
      Fixes: f1b5618e ("vfs: Add a sample program for the new mount API")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/202006282213.C516EA6@keescookSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3eeaae9