1. 26 Sep, 2018 1 commit
    • Jiri Kosina's avatar
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · dbfe2953
      Jiri Kosina authored
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
      dbfe2953
  2. 23 Sep, 2018 1 commit
  3. 15 Sep, 2018 1 commit
  4. 12 Sep, 2018 1 commit
    • Andy Lutomirski's avatar
      x86/pti/64: Remove the SYSCALL64 entry trampoline · bf904d27
      Andy Lutomirski authored
      The SYSCALL64 trampoline has a couple of nice properties:
      
       - The usual sequence of SWAPGS followed by two GS-relative accesses to
         set up RSP is somewhat slow because the GS-relative accesses need
         to wait for SWAPGS to finish.  The trampoline approach allows
         RIP-relative accesses to set up RSP, which avoids the stall.
      
       - The trampoline avoids any percpu access before CR3 is set up,
         which means that no percpu memory needs to be mapped in the user
         page tables.  This prevents using Meltdown to read any percpu memory
         outside the cpu_entry_area and prevents using timing leaks
         to directly locate the percpu areas.
      
      The downsides of using a trampoline may outweigh the upsides, however.
      It adds an extra non-contiguous I$ cache line to system calls, and it
      forces an indirect jump to transfer control back to the normal kernel
      text after CR3 is set up.  The latter is because x86 lacks a 64-bit
      direct jump instruction that could jump from the trampoline to the entry
      text.  With retpolines enabled, the indirect jump is extremely slow.
      
      Change the code to map the percpu TSS into the user page tables to allow
      the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
      new direct information leak, since the TSS is readable by Meltdown from the
      cpu_entry_area alias regardless.  It does allow a timing attack to locate
      the percpu area, but KASLR is more or less a lost cause against local
      attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
      on current hardware, KASLR is only useful to mitigate remote attacks that
      try to attack the kernel without first gaining RCE against a vulnerable
      user process.
      
      On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
      overhead from ~237ns to ~228ns.
      
      There is a possible alternative approach: Move the trampoline within 2G of
      the entry text and make a separate copy for each CPU.  This would allow a
      direct jump to rejoin the normal entry path. There are pro's and con's for
      this approach:
      
       + It avoids a pipeline stall
      
       - It executes from an extra page and read from another extra page during
         the syscall. The latter is because it needs to use a relative
         addressing mode to find sp1 -- it's the same *cacheline*, but accessed
         using an alias, so it's an extra TLB entry.
      
       - Slightly more memory. This would be one page per CPU for a simple
         implementation and 64-ish bytes per CPU or one page per node for a more
         complex implementation.
      
       - More code complexity.
      
      The current approach is chosen for simplicity and because the alternative
      does not provide a significant benefit, which makes it worth.
      
      [ tglx: Added the alternative discussion to the changelog ]
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
      bf904d27
  5. 08 Sep, 2018 2 commits
  6. 06 Sep, 2018 4 commits
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2018-09-06' of... · db44bf4b
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor fix from John Johansen:
       "A fix for an issue syzbot discovered last week:
      
         - Fix for bad debug check when converting secids to secctx"
      
      * tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix bad debug check in apparmor_secid_to_secctx()
      db44bf4b
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · be65e259
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "This fixes two annoying bugs:
      
         - The first one is a side effect caused by using SRCU for rcuidle
           tracepoints. It seems that the perf was depending on the rcuidle
           tracepoints to make RCU watch when it wasn't.
      
           The real fix will be to have perf use SRCU instead of depending on
           RCU watching, but that can't be done until SRCU is safe to use in
           NMI context (Paul's working on that).
      
         - The second bug fix is for a bug that's been periodically making my
           tests fail randomly for some time. I haven't had time to track it
           down, but finally have. It has to do with stressing NMIs (via perf)
           while enabling or disabling ftrace function handling with lockdep
           enabled.
      
           If an interrupt happens and just as it returns, it sets lockdep
           back to "interrupts enabled" but before it returns an NMI is
           triggered, and if this happens while printk_nmi_enter has a
           breakpoint attached to it (because ftrace is converting it to or
           from nop to call fentry), the breakpoint trap also calls into
           lockdep, and since returning from the NMI to a interrupt handler,
           interrupts were disabled when the NMI went off, lockdep keeps its
           state as interrupts disabled when it returns back from the
           interrupt handler where interrupts are enabled.
      
           This causes lockdep_assert_irqs_enabled() to trigger a false
           positive"
      
      * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        printk/tracing: Do not trace printk_nmi_enter()
        tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
      be65e259
    • Linus Torvalds's avatar
      Merge tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 5404525b
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fix for improper fsync after hardlink
      
       - fix for a corruption during file deduplication
      
       - use after free fixes
      
       - RCU warning fix
      
       - fix for buffered write to nodatacow file
      
      * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu
        btrfs: use after free in btrfs_quota_enable
        btrfs: btrfs_shrink_device should call commit transaction at the end
        btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata
        Btrfs: fix data corruption when deduplicating between different files
        Btrfs: sync log after logging new name
        Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
      5404525b
    • Steven Rostedt (VMware)'s avatar
      printk/tracing: Do not trace printk_nmi_enter() · d1c392c9
      Steven Rostedt (VMware) authored
      I hit the following splat in my tests:
      
      ------------[ cut here ]------------
      IRQs not enabled as expected
      WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
      Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      EIP: tick_nohz_idle_enter+0x44/0x8c
      Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
      75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
      e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
      EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
      ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
      CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
      Call Trace:
       do_idle+0x33/0x202
       cpu_startup_entry+0x61/0x63
       start_secondary+0x18e/0x1ed
       startup_32_smp+0x164/0x168
      irq event stamp: 18773830
      hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
      hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
      softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
      softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
      ---[ end trace b7c64aa79e17954a ]---
      
      After a bit of debugging, I found what was happening. This would trigger
      when performing "perf" with a high NMI interrupt rate, while enabling and
      disabling function tracer. Ftrace uses breakpoints to convert the nops at
      the start of functions to calls to the function trampolines. The breakpoint
      traps disable interrupts and this makes calls into lockdep via the
      trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
      
        do_idle {
      
          [interrupts enabled]
      
          <interrupt> [interrupts disabled]
      	TRACE_IRQS_OFF [lockdep says irqs off]
      	[...]
      	TRACE_IRQS_IRET
      	    test if pt_regs say return to interrupts enabled [yes]
      	    TRACE_IRQS_ON [lockdep says irqs are on]
      
      	    <nmi>
      		nmi_enter() {
      		    printk_nmi_enter() [traced by ftrace]
      		    [ hit ftrace breakpoint ]
      		    <breakpoint exception>
      			TRACE_IRQS_OFF [lockdep says irqs off]
      			[...]
      			TRACE_IRQS_IRET [return from breakpoint]
      			   test if pt_regs say interrupts enabled [no]
      			   [iret back to interrupt]
      	   [iret back to code]
      
          tick_nohz_idle_enter() {
      
      	lockdep_assert_irqs_enabled() [lockdep say no!]
      
      Although interrupts are indeed enabled, lockdep thinks it is not, and since
      we now do asserts via lockdep, it gives a false warning. The issue here is
      that printk_nmi_enter() is called before lockdep_off(), which disables
      lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
      printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
      confused.
      
      Cc: stable@vger.kernel.org
      Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d1c392c9
  7. 05 Sep, 2018 6 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · b36fdc68
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "Some GPIO fixes. The ACPI stuff is probably the most annoying for
        users that get fixed this time.
      
         - Atomic contexts, cansleep* calls and such fastpath/slopwpath
           things.
      
         - Defer ACPI event handler registration to late_initcall() so IRQs do
           not fire in our face before other drivers have a chance to register
           handlers.
      
         - Race condition if a consumer requests a GPIO after
           gpiochip_add_data_with_key() but before of_gpiochip_add()
      
         - Probe errorpath in the dwapb driver"
      
      * tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: Fix crash due to registration race
        gpio: dwapb: Fix error handling in dwapb_gpio_probe()
        gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall
        gpiolib: acpi: Switch to cansleep version of GPIO library call
        gpio: adp5588: Fix sleep-in-atomic-context bug
      b36fdc68
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4697d9a
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "A set of very minor fixes and a couple of reverts to fix a major
        problem (the attempt to change the busy count causes a hang when
        attempting to change the drive cache type)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: aacraid: fix a signedness bug
        Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq"
        Revert "scsi: core: fix scsi_host_queue_ready"
        scsi: libata: Add missing newline at end of file
        scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info()
        scsi: hpsa: limit transfer length to 1MB, not 512kB
        scsi: lpfc: Correct MDS diag and nvmet configuration
        scsi: lpfc: Default fdmi_on to on
        scsi: csiostor: fix incorrect port capabilities
        scsi: csiostor: add a check for NULL pointer after kmalloc()
        scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters
        scsi: core: Update SCSI_MQ_DEFAULT help text to match default
      f4697d9a
    • Linus Torvalds's avatar
      Merge tag 'nds32-for-linus-4.19-tag1' of... · d0c1db1d
      Linus Torvalds authored
      Merge tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
      
      Pull nds32 updates from Greentime Hu:
       "Contained in here are the bug fixes, building error fixes and ftrace
        support for nds32"
      
      * tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
        nds32: linker script: GCOV kernel may refers data in __exit
        nds32: fix build error because of wrong semicolon
        nds32: Fix a kernel panic issue because of wrong frame pointer access.
        nds32: Only print one page of stack when die to prevent printing too much information.
        nds32: Add macro definition for offset of lp register on stack
        nds32: Remove the deprecated ABI implementation
        nds32/stack: Get real return address by using ftrace_graph_ret_addr
        nds32/ftrace: Support dynamic function graph tracer
        nds32/ftrace: Support dynamic function tracer
        nds32/ftrace: Add RECORD_MCOUNT support
        nds32/ftrace: Support static function graph tracer
        nds32/ftrace: Support static function tracer
        nds32: Extract the checking and getting pointer to a macro
        nds32: Clean up the coding style
        nds32: Fix get_user/put_user macro expand pointer problem
        nds32: Fix empty call trace
        nds32: add NULL entry to the end of_device_id array
        nds32: fix logic for module
      d0c1db1d
    • Steven Rostedt (VMware)'s avatar
      tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints · 865e63b0
      Steven Rostedt (VMware) authored
      Borislav reported the following splat:
      
       =============================
       WARNING: suspicious RCU usage
       4.19.0-rc1+ #1 Not tainted
       -----------------------------
       ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
      
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 2, debug_locks = 1
       RCU used illegally from extended quiescent state!
       1 lock held by swapper/0/0:
        #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
       Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
       Call Trace:
        dump_stack+0x85/0xcb
        perf_event_output_forward+0xf6/0x130
        __perf_event_overflow+0x52/0xe0
        perf_swevent_overflow+0x91/0xb0
        perf_tp_event+0x11a/0x350
        ? find_held_lock+0x2d/0x90
        ? __lock_acquire+0x2ce/0x1350
        ? __lock_acquire+0x2ce/0x1350
        ? retint_kernel+0x2d/0x2d
        ? find_held_lock+0x2d/0x90
        ? tick_nohz_get_sleep_length+0x83/0xb0
        ? perf_trace_cpu+0xbb/0xd0
        ? perf_trace_buf_alloc+0x5a/0xa0
        perf_trace_cpu+0xbb/0xd0
        cpuidle_enter_state+0x185/0x340
        do_idle+0x1eb/0x260
        cpu_startup_entry+0x5f/0x70
        start_kernel+0x49b/0x4a6
        secondary_startup_64+0xa4/0xb0
      
      This is due to the tracepoints moving to SRCU usage which does not require
      RCU to be "watching". But perf uses these tracepoints with RCU and expects
      it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
      calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
      working in NMI context, and then perf can be converted to use that instead
      of normal RCU.
      
      Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home
      
      Cc: x86-ml <x86@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Tested-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatar"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Fixes: e6753f23 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      865e63b0
    • Greentime Hu's avatar
      nds32: linker script: GCOV kernel may refers data in __exit · 3350139c
      Greentime Hu authored
      This patch is used to fix nds32 allmodconfig/allyesconfig build error
      because GCOV kernel embeds counters in the kernel for each line
      and a part of that embed in __exit text. So we need to keep the
      EXIT_TEXT and EXIT_DATA  if CONFIG_GCOV_KERNEL=y.
      
      Link: https://lkml.org/lkml/2018/9/1/125Signed-off-by: default avatarGreentime Hu <greentime@andestech.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      3350139c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 0e9b1039
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "17 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        nilfs2: convert to SPDX license tags
        drivers/dax/device.c: convert variable to vm_fault_t type
        lib/Kconfig.debug: fix three typos in help text
        checkpatch: add __ro_after_init to known $Attribute
        mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
        uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
        memory_hotplug: fix kernel_panic on offline page processing
        checkpatch: add optional static const to blank line declarations test
        ipc/shm: properly return EIDRM in shm_lock()
        mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
        mm/util.c: improve kvfree() kerneldoc
        tools/vm/page-types.c: fix "defined but not used" warning
        tools/vm/slabinfo.c: fix sign-compare warning
        kmemleak: always register debugfs file
        mm: respect arch_dup_mmap() return value
        mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
        mm: memcontrol: print proper OOM header when no eligible victim left
      0e9b1039
  8. 04 Sep, 2018 24 commits
    • Ryusuke Konishi's avatar
      nilfs2: convert to SPDX license tags · ae98043f
      Ryusuke Konishi authored
      Remove the verbose license text from NILFS2 files and replace them with
      SPDX tags.  This does not change the license of any of the code.
      
      Link: http://lkml.kernel.org/r/1535624528-5982-1-git-send-email-konishi.ryusuke@lab.ntt.co.jpSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae98043f
    • Souptick Joarder's avatar
      drivers/dax/device.c: convert variable to vm_fault_t type · 36bdac1e
      Souptick Joarder authored
      As part of 226ab561 ("device-dax: Convert to vmf_insert_mixed and
      vm_fault_t") in 4.19-rc1, 'rc' was not converted to vm_fault_t.  Now
      converted.
      
      Link: http://lkml.kernel.org/r/20180830153813.GA26059@jordon-HP-15-Notebook-PCSigned-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36bdac1e
    • Thibaut Sautereau's avatar
    • Joe Perches's avatar
      checkpatch: add __ro_after_init to known $Attribute · c5967e98
      Joe Perches authored
      __ro_after_init is a specific __attribute__ that checkpatch does currently
      not understand.
      
      Add it to the known $Attribute types so that code that uses variables
      declared with __ro_after_init are not thought to be a modifier type.
      
      This appears as a defect in checkpatch output of code like:
      
      static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
      [...]
             if (trust_cpu && arch_init) {
      
      where checkpatch reports:
      
      ERROR: space prohibited after that '&&' (ctx:WxW)
      	if (trust_cpu && arch_init) {
      
      Link: http://lkml.kernel.org/r/0fa8a2cb83ade4c525e18261ecf6cfede3015983.camel@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5967e98
    • Dave Jiang's avatar
      mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal · 62ec0d8c
      Dave Jiang authored
      It looks like I missed the PUD path when doing VM_MIXEDMAP removal.
      This can be triggered by:
      1. Boot with memmap=4G!8G
      2. build ndctl with destructive flag on
      3. make TESTS=device-dax check
      
      [  +0.000675] kernel BUG at mm/huge_memory.c:824!
      
      Applying the same change that was applied to vmf_insert_pfn_pmd() in the
      original patch.
      
      Link: http://lkml.kernel.org/r/153565957352.35524.1005746906902065126.stgit@djiang5-desk3.ch.intel.com
      Fixes: e1fb4a08 ("dax: remove VM_MIXEDMAP for fsdax and device dax")
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Reported-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Tested-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62ec0d8c
    • Randy Dunlap's avatar
      uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name · 8a2336e5
      Randy Dunlap authored
      Since this header is in "include/uapi/linux/", apparently people want to
      use it in userspace programs -- even in C++ ones.  However, the header
      uses a C++ reserved keyword ("private"), so change that to "dh_private"
      instead to allow the header file to be used in C++ userspace.
      
      Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051
      Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org
      Fixes: ddbb4114 ("KEYS: Add KEYCTL_DH_COMPUTE command")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a2336e5
    • Mikhail Zaslonko's avatar
      memory_hotplug: fix kernel_panic on offline page processing · 4e8346d0
      Mikhail Zaslonko authored
      Within show_valid_zones() the function test_pages_in_a_zone() should be
      called for online memory blocks only.
      
      Otherwise it might lead to the VM_BUG_ON due to uninitialized struct
      pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set):
      
       page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
       ------------[ cut here ]------------
       Call Trace:
       ([<000000000038f91e>] test_pages_in_a_zone+0xe6/0x168)
        [<0000000000923472>] show_valid_zones+0x5a/0x1a8
        [<0000000000900284>] dev_attr_show+0x3c/0x78
        [<000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150
        [<00000000003ef662>] seq_read+0x212/0x4b8
        [<00000000003bf202>] __vfs_read+0x3a/0x178
        [<00000000003bf3ca>] vfs_read+0x8a/0x148
        [<00000000003bfa3a>] ksys_read+0x62/0xb8
        [<0000000000bc2220>] system_call+0xdc/0x2d8
      
      That VM_BUG_ON was triggered by the page poisoning introduced in
      mm/sparse.c with the git commit d0dc12e8 ("mm/memory_hotplug:
      optimize memory hotplug").
      
      With the same commit the new 'nid' field has been added to the struct
      memory_block in order to store and later on derive the node id for
      offline pages (instead of accessing struct page which might be
      uninitialized).  But one reference to nid in show_valid_zones() function
      has been overlooked.  Fixed with current commit.  Also, nr_pages will
      not be used any more after test_pages_in_a_zone() call, do not update
      it.
      
      Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com
      Fixes: d0dc12e8 ("mm/memory_hotplug: optimize memory hotplug")
      Signed-off-by: default avatarMikhail Zaslonko <zaslonko@linux.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: <stable@vger.kernel.org>	[4.17+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e8346d0
    • Joe Perches's avatar
      checkpatch: add optional static const to blank line declarations test · 328b5f41
      Joe Perches authored
      Using a static const struct definition as part of a series of
      declarations produces a false positive "Missing a blank line after
      declarations" for code like:
      
        WARNING: Missing a blank line after declarations
        #710: FILE: drivers/gpu/drm/tidss/tidss_scale_coefs.c:137:
        +       int inc;
        +       static const struct {
      
      So fix it.
      
      Link: http://lkml.kernel.org/r/5905126e70b0ed1781e49265fd5c49c5090d0223.camel@perches.comSigned-off-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarJyri Sarha <jsarha@ti.com>
      Cc: "Valkeinen, Tomi" <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      328b5f41
    • Davidlohr Bueso's avatar
      ipc/shm: properly return EIDRM in shm_lock() · 9c21dae2
      Davidlohr Bueso authored
      When getting rid of the general ipc_lock(), this was missed furthermore,
      making the comment around the ipc object validity check bogus.  Under
      EIDRM conditions, callers will in turn not see the error and continue
      with the operation.
      
      Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5
      Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian
      Fixes: 82061c57 ("ipc: drop ipc_lock()")
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c21dae2
    • Aneesh Kumar K.V's avatar
      mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. · 464c7ffb
      Aneesh Kumar K.V authored
      When scanning for movable pages, filter out Hugetlb pages if hugepage
      migration is not supported.  Without this we hit infinte loop in
      __offline_pages() where we do
      
      	pfn = scan_movable_pages(start_pfn, end_pfn);
      	if (pfn) { /* We have movable pages */
      		ret = do_migrate_range(pfn, end_pfn);
      		goto repeat;
      	}
      
      Fix this by checking hugepage_migration_supported both in
      has_unmovable_pages which is the primary backoff mechanism for page
      offlining and for consistency reasons also into scan_movable_pages
      because it doesn't make any sense to return a pfn to non-migrateable
      huge page.
      
      This issue was revealed by, but not caused by 72b39cfc ("mm,
      memory_hotplug: do not fail offlining too early").
      
      Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com
      Fixes: 72b39cfc ("mm, memory_hotplug: do not fail offlining too early")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: default avatarHaren Myneni <haren@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      464c7ffb
    • Andrew Morton's avatar
      mm/util.c: improve kvfree() kerneldoc · 04b8e946
      Andrew Morton authored
      Scooped from an email from Matthew.
      
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04b8e946
    • Naoya Horiguchi's avatar
      tools/vm/page-types.c: fix "defined but not used" warning · 7ab660f8
      Naoya Horiguchi authored
      debugfs_known_mountpoints[] is not used any more, so let's remove it.
      
      Link: http://lkml.kernel.org/r/1535102651-19418-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ab660f8
    • Naoya Horiguchi's avatar
      tools/vm/slabinfo.c: fix sign-compare warning · 90450656
      Naoya Horiguchi authored
      Currently we get the following compiler warning:
      
          slabinfo.c:854:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             if (s->object_size < min_objsize)
                                ^
      
      due to the mismatch of signed/unsigned comparison.  ->object_size and
      ->slab_size are never expected to be negative, so let's define them as
      unsigned int.
      
      [n-horiguchi@ah.jp.nec.com: convert everything - none of these can be negative]
        Link: http://lkml.kernel.org/r/20180826234947.GA9787@hori1.linux.bs1.fc.nec.co.jp
      Link: http://lkml.kernel.org/r/1535103134-20239-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90450656
    • Vincent Whitchurch's avatar
      kmemleak: always register debugfs file · b353756b
      Vincent Whitchurch authored
      If kmemleak built in to the kernel, but is disabled by default, the
      debugfs file is never registered.  Because of this, it is not possible
      to find out if the kernel is built with kmemleak support by checking for
      the presence of this file.  To allow this, always register the file.
      
      After this patch, if the file doesn't exist, kmemleak is not available
      in the kernel.  If writing "scan" or any other value than "clear" to
      this file results in EBUSY, then kmemleak is available but is disabled
      by default and can be activated via the kernel command line.
      
      Catalin: "that's also consistent with a late disabling of kmemleak when
      the debugfs entry sticks around."
      
      Link: http://lkml.kernel.org/r/20180824131220.19176-1-vincent.whitchurch@axis.comSigned-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b353756b
    • Nadav Amit's avatar
      mm: respect arch_dup_mmap() return value · 1ed0cc5a
      Nadav Amit authored
      Commit d70f2a14 ("include/linux/sched/mm.h: uninline mmdrop_async(),
      etc") ignored the return value of arch_dup_mmap(). As a result, on x86,
      a failure to duplicate the LDT (e.g. due to memory allocation error)
      would leave the duplicated memory mapping in an inconsistent state.
      
      Fix by using the return value, as it was before the change.
      
      Link: http://lkml.kernel.org/r/20180823051229.211856-1-namit@vmware.com
      Fixes: d70f2a14 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ed0cc5a
    • Tetsuo Handa's avatar
      mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm(). · 79cc8105
      Tetsuo Handa authored
      Commit 93065ac7 ("mm, oom: distinguish blockable mode for mmu
      notifiers") has added an ability to skip over vmas with blockable mmu
      notifiers. This however didn't call tlb_finish_mmu as it should.
      
      As a result inc_tlb_flush_pending has been called without its pairing
      dec_tlb_flush_pending and all callers mm_tlb_flush_pending would flush
      even though this is not really needed.  This alone is not harmful and it
      seems there shouldn't be any such callers for oom victims at all but
      there is no real reason to skip tlb_finish_mmu on early skip either so
      call it.
      
      [mhocko@suse.com: new changelog]
      Link: http://lkml.kernel.org/r/b752d1d5-81ad-7a35-2394-7870641be51c@i-love.sakura.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79cc8105
    • Johannes Weiner's avatar
      mm: memcontrol: print proper OOM header when no eligible victim left · 3100dab2
      Johannes Weiner authored
      When the memcg OOM killer runs out of killable tasks, it currently
      prints a WARN with no further OOM context.  This has caused some user
      confusion.
      
      Warnings indicate a kernel problem.  In a reported case, however, the
      situation was triggered by a nonsensical memcg configuration (hard limit
      set to 0).  But without any VM context this wasn't obvious from the
      report, and it took some back and forth on the mailing list to identify
      what is actually a trivial issue.
      
      Handle this OOM condition like we handle it in the global OOM killer:
      dump the full OOM context and tell the user we ran out of tasks.
      
      This way the user can identify misconfigurations easily by themselves
      and rectify the problem - without having to go through the hassle of
      running into an obscure but unsettling warning, finding the appropriate
      kernel mailing list and waiting for a kernel developer to remote-analyze
      that the memcg configuration caused this.
      
      If users cannot make sense of why the OOM killer was triggered or why it
      failed, they will still report it to the mailing list, we know that from
      experience.  So in case there is an actual kernel bug causing this,
      kernel developers will very likely hear about it.
      
      Link: http://lkml.kernel.org/r/20180821160406.22578-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3100dab2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 28619527
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Must perform TXQ teardown before unregistering interfaces in
          mac80211, from Toke Høiland-Jørgensen.
      
       2) Don't allow creating mac80211_hwsim with less than one channel, from
          Johannes Berg.
      
       3) Division by zero in cfg80211, fix from Johannes Berg.
      
       4) Fix endian issue in tipc, from Haiqing Bai.
      
       5) BPF sockmap use-after-free fixes from Daniel Borkmann.
      
       6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
      
       7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
      
       8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
          kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
      
       9) Fix deadlock in hv_netvsc, from Dexuan Cui.
      
      10) Do not restart timewait timer on RST, from Florian Westphal.
      
      11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
      
      12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
      
      13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
      
      14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
      
      15) Use-after-free in act_ife, fix from Cong Wang.
      
      16) Missing hold to meta module in act_ife, from Vlad Buslov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
        net: phy: sfp: Handle unimplemented hwmon limits and alarms
        net: sched: action_ife: take reference to meta module
        act_ife: fix a potential use-after-free
        net/mlx5: Fix SQ offset in QPs with small RQ
        tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
        tipc: correct spelling errors for struct tipc_bc_base's comment
        bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
        bnxt_en: Clean up unused functions.
        bnxt_en: Fix firmware signaled resource change logic in open.
        sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
        sctp: fix invalid reference to the index variable of the iterator
        net/ibm/emac: wrong emac_calc_base call was used by typo
        net: sched: null actions array pointer before releasing action
        vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
        r8169: add support for NCube 8168 network card
        ip6_tunnel: respect ttl inherit for ip6tnl
        mac80211: shorten the IBSS debug messages
        mac80211: don't Tx a deauth frame if the AP forbade Tx
        mac80211: Fix station bandwidth setting after channel switch
        mac80211: fix a race between restart and CSA flows
        ...
      28619527
    • Andrew Lunn's avatar
      net: phy: sfp: Handle unimplemented hwmon limits and alarms · a33710bd
      Andrew Lunn authored
      Not all SFPs implement the registers containing sensor limits and
      alarms. Luckily, there is a bit indicating if they are implemented or
      not. Add checking for this bit, when deciding if the hwmon attributes
      should be visible.
      
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33710bd
    • Vlad Buslov's avatar
      net: sched: action_ife: take reference to meta module · 84cb8eb2
      Vlad Buslov authored
      Recent refactoring of add_metainfo() caused use_all_metadata() to add
      metainfo to ife action metalist without taking reference to module. This
      causes warning in module_put called from ife action cleanup function.
      
      Implement add_metainfo_and_get_ops() function that returns with reference
      to module taken if metainfo was added successfully, and call it from
      use_all_metadata(), instead of calling __add_metainfo() directly.
      
      Example warning:
      
      [  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
      [  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
      [  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
      [  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  646.440595] RIP: 0010:module_put+0x1cb/0x230
      [  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
      [  646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
      [  646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
      [  646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
      [  646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
      [  646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
      [  646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
      [  646.508213] FS:  00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
      [  646.516961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
      [  646.530595] Call Trace:
      [  646.533408]  ? find_symbol_in_section+0x260/0x260
      [  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
      [  646.543695]  tcf_action_cleanup+0x29/0xa0
      [  646.548078]  __tcf_action_put+0x5a/0xb0
      [  646.552289]  ? nla_put+0x65/0xe0
      [  646.555889]  __tcf_idr_release+0x48/0x60
      [  646.560187]  tcf_generic_walker+0x448/0x6b0
      [  646.564764]  ? tcf_action_dump_1+0x450/0x450
      [  646.569411]  ? __lock_is_held+0x84/0x110
      [  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
      [  646.578982]  tca_action_gd+0x972/0xc40
      [  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
      [  646.588471]  ? mark_lock+0xcf/0x980
      [  646.592324]  ? check_chain_key+0x140/0x1f0
      [  646.596832]  ? debug_show_all_locks+0x240/0x240
      [  646.601839]  ? memset+0x1f/0x40
      [  646.605350]  ? nla_parse+0xca/0x1a0
      [  646.609217]  tc_ctl_action+0x215/0x230
      [  646.613339]  ? tcf_action_add+0x220/0x220
      [  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
      [  646.622227]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.626466]  netlink_rcv_skb+0x18d/0x200
      [  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
      [  646.634959]  ? netlink_ack+0x500/0x500
      [  646.639106]  netlink_unicast+0x2d0/0x370
      [  646.643409]  ? netlink_attachskb+0x340/0x340
      [  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
      [  646.652870]  ? import_iovec+0x11e/0x1c0
      [  646.657083]  netlink_sendmsg+0x3b9/0x6a0
      [  646.661388]  ? netlink_unicast+0x370/0x370
      [  646.665877]  ? netlink_unicast+0x370/0x370
      [  646.670351]  sock_sendmsg+0x6b/0x80
      [  646.674212]  ___sys_sendmsg+0x4a1/0x520
      [  646.678443]  ? copy_msghdr_from_user+0x210/0x210
      [  646.683463]  ? lock_downgrade+0x320/0x320
      [  646.687849]  ? debug_show_all_locks+0x240/0x240
      [  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
      [  646.697418]  ? _raw_spin_unlock+0x24/0x30
      [  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
      [  646.706619]  ? __pmd_alloc+0x320/0x320
      [  646.710738]  ? debug_show_all_locks+0x240/0x240
      [  646.715649]  ? restore_nameidata+0x7b/0xa0
      [  646.720117]  ? check_chain_key+0x140/0x1f0
      [  646.724590]  ? check_chain_key+0x140/0x1f0
      [  646.729070]  ? __fget_light+0xbc/0xd0
      [  646.733121]  ? __sys_sendmsg+0xd7/0x150
      [  646.737329]  __sys_sendmsg+0xd7/0x150
      [  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
      [  646.746003]  ? up_read+0x53/0x90
      [  646.749601]  ? __do_page_fault+0x484/0x780
      [  646.754105]  ? do_syscall_64+0x1e/0x2c0
      [  646.758320]  do_syscall_64+0x72/0x2c0
      [  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  646.767776] RIP: 0033:0x7f4163872150
      [  646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
      [  646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
      [  646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
      [  646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
      [  646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
      [  646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
      [  646.837360] irq event stamp: 6083
      [  646.841043] hardirqs last  enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500
      [  646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  646.859775] softirqs last  enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee
      [  646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife]
      [  646.878845] ---[ end trace b1b8c12ffe51e657 ]---
      
      Fixes: 5ffe57da ("act_ife: fix a potential deadlock")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84cb8eb2
    • Cong Wang's avatar
      act_ife: fix a potential use-after-free · 6d784f16
      Cong Wang authored
      Immediately after module_put(), user could delete this
      module, so e->ops could be already freed before we call
      e->ops->release().
      
      Fix this by moving module_put() after ops->release().
      
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d784f16
    • Tariq Toukan's avatar
      net/mlx5: Fix SQ offset in QPs with small RQ · 639505d4
      Tariq Toukan authored
      Correct the formula for calculating the RQ page remainder,
      which should be in byte granularity.  The result will be
      non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
      is a power of 2.
      
      Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
      SQ offset in strides granularity.
      
      Fixes: d7037ad7 ("net/mlx5: Fix QP fragmented buffer allocation")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      639505d4
    • Greentime Hu's avatar
      nds32: fix build error because of wrong semicolon · ec865393
      Greentime Hu authored
      It shall be removed in the define usage. We shall not put a semicolon there.
      
      /kisskb/src/arch/nds32/include/asm/elf.h:126:29: error: expected '}' before ';' token
       #define ELF_DATA ELFDATA2LSB;
                                   ^
      /kisskb/src/fs/proc/kcore.c:318:17: note: in expansion of macro 'ELF_DATA'
           [EI_DATA] = ELF_DATA,
                       ^~~~~~~~
      /kisskb/src/fs/proc/kcore.c:312:15: note: to match this '{'
          .e_ident = {
                     ^
      /kisskb/src/scripts/Makefile.build:307: recipe for target 'fs/proc/kcore.o' failed
      Signed-off-by: default avatarGreentime Hu <greentime@andestech.com>
      ec865393
    • Greentime Hu's avatar
      nds32: Fix a kernel panic issue because of wrong frame pointer access. · 0cde56e0
      Greentime Hu authored
      It can make sure that trace_hardirqs_off/trace_hardirqs_on can get a correct
      return address by frame pointer through __builtin_return_address() in this fix.
      
      Unable to handle kernel paging request at virtual address fffffffc
      pgd = 3c42e9cf
      [fffffffc] *pgd=02a9c000
      
      Internal error: Oops: 1 [#1]
      Modules linked in:
      CPU: 0
      PC is at trace_hardirqs_off+0x78/0xec
      LP is at common_exception_handler+0xda/0xf4
      pc : [<b23ea5a4>]    lp : [<b2352eba>]    Tainted: G        W
      sp : ada60ab0  fp : efcaff48  gp : 3a020490
      r25: efcb0000  r24: 00000000
      r23: 00000000  r22: 00000000  r21: 00000000  r20: 000700c1
      r19: 000700ca  r18: 3a21b018  r17: 00000001  r16: 00000002
      r15: 00000001  r14: 0000002a  r13: 3a00a804  r12: ada60ab0
      r11: 3a113af8  r10: 3a01c530  r9 : 3a124404  r8 : 00120f9c
      r7 : b2352eba  r6 : 00000000  r5 : 3a126b58  r4 : 00000000
      r3 : 3a1726a8  r2 : b2921000  r1 : 00000000  r0 : 00000000
        IRQs off  Segment user
      Process init (pid: 1, stack limit = 0x069d7f15)
      Stack: (0xada60ab0 to 0xada61000)
      Stack: 0aa0:                                     00000000 00000003 3a110000 0011f000
      Stack: 0ac0: 00000005 00000000 00000000 00000000 ada60b10 3a01fe68 ada60b0c ada60b08
      Stack: 0ae0: 00000000 ada60ab8 ada60b30 3a020550 00000000 00000001 3a11c2f8 3a01c6e8
      Stack: 0b00: 3a01cb80 fffffba8 3a113af8 3a21b018 3a122c28 00003ec4 00000165 00000000
      Stack: 0b20: 3a126aec 0000006c 00000000 00000001 3a01fe68 00000000 00000003 00000000
      Stack: 0b40: 00000001 000003f8 3a020930 3a01c530 00000008 ada60c18 3a020490 3a003120
      Stack: 0b60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0b80: 00000000 00000000 00000000 00000000 ffff8000 00000000 00000000 00000000
      Stack: 0ba0: 00000000 00000001 3a020550 00000000 3a01d020 00000000 fffff000 fffff000
      Stack: 0bc0: 00000000 00000000 00000000 00000000 ada60f2c 00000000 00000001 00000000
      Stack: 0be0: 00000000 00000000 3a01fe68 fffffab0 00008034 00000008 3a0010cc 3a01fe68
      Stack: 0c00: 00000000 00000000 00000001 ada60c88 3a020490 3a0139d4 0009dc6f 00000000
      Stack: 0c20: 00000000 00000000 ada60fce fffff000 00000000 0000ebe0 3a020038 3a020550
      Stack: 0c40: ada60f20 ada60c90 3a0007f0 3a0002a8 ada60c8c 00000000 00000000 ada60c88
      Stack: 0c60: 3a020490 3a004570 00000000 00000000 ada60f20 3a0007f0 3a000000 00000000
      Stack: 0c80: 3a020490 3a004850 00000000 3a013f24 3a000000 00000000 3a01ff44 00000000
      Stack: 0ca0: 00000000 00000000 00000000 00000000 00000000 00000000 3a01ff84 3a01ff7c
      Stack: 0cc0: 3a01ff4c 3a01ff5c 3a01ff64 3a01ff9c 3a01ffa4 3a01ffac 3a01ff6c 3a01ff74
      Stack: 0ce0: 00000000 00000000 3a01ff44 00000000 00000000 00000000 00000000 00000000
      Stack: 0d00: 3a01ff8c 00000000 00000000 3a01ff94 00000000 00000000 00000000 00000000
      Stack: 0d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0d40: 3a01ffbc 3a01ffb4 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0d60: 00000000 00000000 00000000 00000000 00000000 3a01ffc4 00000000 00000000
      Stack: 0d80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0da0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0dc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3a01ff54
      Stack: 0de0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0e00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0e20: 00000000 00000004 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0e40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Stack: 0ec0: 00000000 00000000 00000000 00000000 ffffffff 00000000 00000000 00000000
      Stack: 0ee0: 00000000 00000000 00000000 00000000 ada60f20 00000000 00000000 00000000
      Stack: 0f00: 00000000 00000000 00000000 00000000 00000000 00000000 3a020490 3a000b24
      Stack: 0f20: 00000001 ada60fde 00000000 ada60fe4 ada60feb 00000000 00000021 3a038000
      Stack: 0f40: 00000010 0009dc6f 00000006 00001000 00000011 00000064 00000003 00008034
      Stack: 0f60: 00000004 00000020 00000005 00000008 00000007 3a000000 00000008 00000000
      Stack: 0f80: 00000009 0000ebe0 0000000b 00000000 0000000c 00000000 0000000d 00000000
      Stack: 0fa0: 0000000e 00000000 00000017 00000000 00000019 ada60fce 0000001f ada60ff6
      Stack: 0fc0: 00000000 00000000 00000000 b5010000 fa839914 23b5dd89 a2aea540 692fc82e
      Stack: 0fe0: 0074696e 454d4f48 54002f3d 3d4d5245 756e696c 692f0078 0074696e 00000000
      CPU: 0 PID: 1 Comm: init Tainted: G        W         4.18.0-00015-g1888b64a2558-dirty #112
      Hardware name: andestech,ae3xx (DT)
      Call Trace:
      [<b27a8e34>] dump_stack+0x2c/0x38
      [<b2354874>] die+0x128/0x18c
      [<b2356f4c>] do_page_fault+0x3b8/0x4e0
      [<b2352ed4>] ret_from_exception+0x0/0x10
      [<b2352eba>] common_exception_handler+0xda/0xf4
      Signed-off-by: default avatarGreentime Hu <greentime@andestech.com>
      0cde56e0