1. 30 Nov, 2017 10 commits
    • Eric Biggers's avatar
      lib/mpi: call cond_resched() from mpi_powm() loop · 443d26a6
      Eric Biggers authored
      commit 1d9ddde1 upstream.
      
      On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
      largest permitted inputs (16384 bits), the kernel spends 10+ seconds
      doing modular exponentiation in mpi_powm() without rescheduling.  If all
      threads do it, it locks up the system.  Moreover, it can cause
      rcu_sched-stall warnings.
      
      Notwithstanding the insanity of doing this calculation in kernel mode
      rather than in userspace, fix it by calling cond_resched() as each bit
      from the exponent is processed.  It's still noninterruptible, but at
      least it's preemptible now.
      
      Do the cond_resched() once per bit rather than once per MPI limb because
      each limb might still easily take 100+ milliseconds on slow CPUs.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      443d26a6
    • Paul E. McKenney's avatar
      sched: Make resched_cpu() unconditional · fb8bd56e
      Paul E. McKenney authored
      commit 7c2102e5 upstream.
      
      The current implementation of synchronize_sched_expedited() incorrectly
      assumes that resched_cpu() is unconditional, which it is not.  This means
      that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
      fails as follows (analysis by Neeraj Upadhyay):
      
      o	CPU1 is waiting for expedited wait to complete:
      
      	sync_rcu_exp_select_cpus
      	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
      	     IPI sent to CPU5
      
      	synchronize_sched_expedited_wait
      		 ret = swait_event_timeout(rsp->expedited_wq,
      					   sync_rcu_preempt_exp_done(rnp_root),
      					   jiffies_stall);
      
      	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
      
      o	CPU5 handles IPI and fails to acquire rq lock.
      
      	Handles IPI
      	     sync_sched_exp_handler
      		 resched_cpu
      		     returns while failing to try lock acquire rq->lock
      		 need_resched is not set
      
      o	CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
      	idle (schedule() is not called).
      
      o	CPU 1 reports RCU stall.
      
      Given that resched_cpu() is now used only by RCU, this commit fixes the
      assumption by making resched_cpu() unconditional.
      Reported-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Suggested-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb8bd56e
    • WANG Cong's avatar
      vsock: use new wait API for vsock_stream_sendmsg() · 6be6e48d
      WANG Cong authored
      commit 499fde66 upstream.
      
      As reported by Michal, vsock_stream_sendmsg() could still
      sleep at vsock_stream_has_space() after prepare_to_wait():
      
        vsock_stream_has_space
          vmci_transport_stream_has_space
            vmci_qpair_produce_free_space
              qp_lock
                qp_acquire_queue_mutex
                  mutex_lock
      
      Just switch to the new wait API like we did for commit
      d9dc8b0f ("net: fix sleeping for sk_wait_event()").
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: "Jorgen S. Hansen" <jhansen@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6be6e48d
    • WANG Cong's avatar
      ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER · 41e4fbdf
      WANG Cong authored
      commit 76da0704 upstream.
      
      In commit 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
      I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
      unfortunately, as reported by jeffy, netdev_wait_allrefs()
      could rebroadcast NETDEV_UNREGISTER event until all refs are
      gone.
      
      We have to add an additional check to avoid this corner case.
      For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
      for dev_change_net_namespace(), dev->reg_state is
      NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.
      
      Fixes: 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
      Reported-by: default avatarjeffy <jeffy.chen@rock-chips.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41e4fbdf
    • Vlastimil Babka's avatar
      x86/mm: fix use-after-free of vma during userfaultfd fault · d0629c6b
      Vlastimil Babka authored
      commit cb0631fd upstream.
      
      Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
      __do_page_fault() with the following reproducer:
      
        mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
        mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
        r0 = userfaultfd(0x0)
        ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
        ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
        r1 = gettid()
        syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
        tkill(r1, 0x7)
      
      The vma should be pinned by mmap_sem, but handle_userfault() might (in a
      return to userspace scenario) release it and then acquire again, so when
      we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
      the vma might be gone.
      
      Specifically, per Andrea the scenario is
       "A return to userland to repeat the page fault later with a
        VM_FAULT_NOPAGE retval (potentially after handling any pending signal
        during the return to userland). The return to userland is identified
        whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
        vmf->flags"
      
      However, since commit a3c4fb7c ("x86/mm: Fix fault error path using
      unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
      that point, which can thus become use-after-free.  Fix this by moving
      the read before calling handle_mm_fault().
      Reported-by: default avatarsyzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0629c6b
    • Lv Zheng's avatar
      ACPI / EC: Fix regression related to triggering source of EC event handling · 7003eb63
      Lv Zheng authored
      commit 53c5eaab upstream.
      
      Originally the Samsung quirks removed by commit 4c237371 can be covered
      by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956
      changed ec_freeze_events=Y back to N, making this problem re-surface.
      
      Actually, if commit e923e8e7 is robust enough, we can freely change
      ec_freeze_events mode, so this patch fixes the issue by improving
      commit e923e8e7.
      
      Related commits listed in the merged order:
      
       Commit: e923e8e7
       Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected
                after event is enabled
      
       Commit: 4c237371
       Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk
      
       Commit: 9c40f956
       Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix
                a regression
      
      This patch not only fixes the reported post-resume EC event triggering
      source issue, but also fixes an unreported similar issue related to the
      driver bind by adding EC event triggering source in ec_install_handlers().
      
      Fixes: e923e8e7 (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled)
      Fixes: 4c237371 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk)
      Fixes: 9c40f956 (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Reported-by: default avatarAlistair Hamilton <ahpatent@gmail.com>
      Tested-by: default avatarAlistair Hamilton <ahpatent@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7003eb63
    • Vasily Gorbik's avatar
      s390/disassembler: increase show_code buffer size · 7160a447
      Vasily Gorbik authored
      commit b192571d upstream.
      
      Current buffer size of 64 is too small. objdump shows that there are
      instructions which would require up to 75 bytes buffer (with current
      formating). 128 bytes "ought to be enough for anybody".
      
      Also replaces 8 spaces with a single tab to reduce the memory footprint.
      
      Fixes the following KASAN finding:
      
      BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538
      Write of size 1 at addr 000000005a4a75a0 by task bash/1282
      
      CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215
      Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
      Call Trace:
      ([<000000000011eeb6>] show_stack+0x56/0x88)
       [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0
       [<00000000004e2994>] print_address_description+0xf4/0x288
       [<00000000004e2cf2>] kasan_report+0x13a/0x230
       [<0000000000e38ae6>] number+0x3fe/0x538
       [<0000000000e3dfe4>] vsnprintf+0x194/0x948
       [<0000000000e3ea42>] sprintf+0xa2/0xb8
       [<00000000001198dc>] print_insn+0x374/0x500
       [<0000000000119346>] show_code+0x4ee/0x538
       [<000000000011f234>] show_registers+0x34c/0x388
       [<000000000011f2ae>] show_regs+0x3e/0xa8
       [<000000000011f502>] die+0x1ea/0x2e8
       [<0000000000138f0e>] do_no_context+0x106/0x168
       [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0
       [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0
       [<000000000090639e>] sysrq_handle_crash+0x46/0x58
      ([<0000000000000007>] 0x7)
       [<00000000009073fa>] __handle_sysrq+0x102/0x218
       [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100
       [<000000000061d67a>] proc_reg_write+0xb2/0x128
       [<0000000000520be6>] __vfs_write+0xee/0x368
       [<0000000000521222>] vfs_write+0x21a/0x278
       [<000000000052156a>] SyS_write+0xda/0x178
       [<0000000000e555cc>] system_call+0xc4/0x270
      
      The buggy address belongs to the page:
      page:000003d1016929c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x0()
      raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000
      raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
       000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00
      >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
                                     ^
       000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8
       000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00
      ==================================================================
      Signed-off-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7160a447
    • Heiko Carstens's avatar
      s390/disassembler: add missing end marker for e7 table · 53809960
      Heiko Carstens authored
      commit 5c505387 upstream.
      
      The e7 opcode table does not have an end marker. Hence when trying to
      find an unknown e7 instruction the code will access memory behind the
      table until it finds something that matches the opcode, or the kernel
      crashes, whatever comes first.
      
      This affects not only the in-kernel disassembler but also uprobes and
      kprobes which refuse to set a probe on unknown instructions, and
      therefore search the opcode tables to figure out if instructions are
      known or not.
      
      Fixes: 3585cb02 ("s390/disassembler: add vector instructions")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53809960
    • Heiko Carstens's avatar
      s390/runtime instrumention: fix possible memory corruption · 550435a1
      Heiko Carstens authored
      commit d6e646ad upstream.
      
      For PREEMPT enabled kernels the runtime instrumentation (RI) code
      contains a possible use-after-free bug. If a task that makes use of RI
      exits, it will execute do_exit() while still enabled for preemption.
      
      That function will call exit_thread_runtime_instr() via
      exit_thread(). If exit_thread_runtime_instr() gets preempted after the
      RI control block of the task has been freed but before the pointer to
      it is set to NULL, then save_ri_cb(), called from switch_to(), will
      write to already freed memory.
      
      Avoid this and simply disable preemption while freeing the control
      block and setting the pointer to NULL.
      
      Fixes: e4b8b3f3 ("s390: add support for runtime instrumentation")
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      550435a1
    • Heiko Carstens's avatar
      s390: fix transactional execution control register handling · c9d0db61
      Heiko Carstens authored
      commit a1c5befc upstream.
      
      Dan Horák reported the following crash related to transactional execution:
      
      User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
      CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
      Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      task: 00000000fafc8000 task.stack: 00000000fafc4000
      User PSW : 0705200180000000 000003ff93c14e70
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
      User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
                 0000000000000000 0000000000000002 0000000000000000 000003ff00000000
                 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
                 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
      User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(%r11)
                 000003ff93c14e66: 60f0b038            std     %f15,56(%r11)
                #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
                >000003ff93c14e70: a7740006            brc     7,3ff93c14e7c
                 000003ff93c14e74: a7080000            lhi     %r0,0
                 000003ff93c14e78: a7f40023            brc     15,3ff93c14ebe
                 000003ff93c14e7c: b2220000            ipm     %r0
                 000003ff93c14e80: 8800001c            srl     %r0,28
      
      There are several bugs with control register handling with respect to
      transactional execution:
      
      - on task switch update_per_regs() is only called if the next task has
        an mm (is not a kernel thread). This however is incorrect. This
        breaks e.g. for user mode helper handling, where the kernel creates
        a kernel thread and then execve's a user space program. Control
        register contents related to transactional execution won't be
        updated on execve. If the previous task ran with transactional
        execution disabled then the new task will also run with
        transactional execution disabled, which is incorrect. Therefore call
        update_per_regs() unconditionally within switch_to().
      
      - on startup the transactional execution facility is not enabled for
        the idle thread. This is not really a bug, but an inconsistency to
        other facilities. Therefore enable the facility if it is available.
      
      - on fork the new thread's per_flags field is not cleared. This means
        that a child process inherits the PER_FLAG_NO_TE flag. This flag can
        be set with a ptrace request to disable transactional execution for
        the current process. It should not be inherited by new child
        processes in order to be consistent with the handling of all other
        PER related debugging options. Therefore clear the per_flags field in
        copy_thread_tls().
      Reported-and-tested-by: default avatarDan Horák <dan@danny.cz>
      Fixes: d35339a4 ("s390: add support for transactional memory")
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9d0db61
  2. 24 Nov, 2017 27 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.65 · 133e6ccf
      Greg Kroah-Hartman authored
      133e6ccf
    • Jann Horn's avatar
      mm/pagewalk.c: report holes in hugetlb ranges · ceaec6e8
      Jann Horn authored
      commit 373c4557 upstream.
      
      This matters at least for the mincore syscall, which will otherwise copy
      uninitialized memory from the page allocator to userspace.  It is
      probably also a correctness error for /proc/$pid/pagemap, but I haven't
      tested that.
      
      Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
      no effect because the caller already checks for that.
      
      This only reports holes in hugetlb ranges to callers who have specified
      a hugetlb_entry callback.
      
      This issue was found using an AFL-based fuzzer.
      
      v2:
       - don't crash on ->pte_hole==NULL (Andrew Morton)
       - add Cc stable (Andrew Morton)
      
      Changed for 4.4/4.9 stable backport:
       - fix up conflict in the huge_pte_offset() call
      
      Fixes: 1e25a271 ("mincore: apply page table walker on do_mincore()")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      ceaec6e8
    • Jan Harkes's avatar
      coda: fix 'kernel memory exposure attempt' in fsync · fae59471
      Jan Harkes authored
      commit d337b66a upstream.
      
      When an application called fsync on a file in Coda a small request with
      just the file identifier was allocated, but the declared length was set
      to the size of union of all possible upcall requests.
      
      This bug has been around for a very long time and is now caught by the
      extra checking in usercopy that was introduced in Linux-4.8.
      
      The exposure happens when the Coda cache manager process reads the fsync
      upcall request at which point it is killed. As a result there is nobody
      servicing any further upcalls, trapping any processes that try to access
      the mounted Coda filesystem.
      Signed-off-by: default avatarJan Harkes <jaharkes@cs.cmu.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fae59471
    • Pavel Tatashin's avatar
      mm/page_alloc.c: broken deferred calculation · 9980b827
      Pavel Tatashin authored
      commit d135e575 upstream.
      
      In reset_deferred_meminit() we determine number of pages that must not
      be deferred.  We initialize pages for at least 2G of memory, but also
      pages for reserved memory in this node.
      
      The reserved memory is determined in this function:
      memblock_reserved_memory_within(), which operates over physical
      addresses, and returns size in bytes.  However, reset_deferred_meminit()
      assumes that that this function operates with pfns, and returns page
      count.
      
      The result is that in the best case machine boots slower than expected
      due to initializing more pages than needed in single thread, and in the
      worst case panics because fewer than needed pages are initialized early.
      
      Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
      Fixes: 864b9a39 ("mm: consider memblock reservations for deferred memory initialization sizing")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9980b827
    • Corey Minyard's avatar
      ipmi: fix unsigned long underflow · 55b06b0f
      Corey Minyard authored
      commit 392a17b1 upstream.
      
      When I set the timeout to a specific value such as 500ms, the timeout
      event will not happen in time due to the overflow in function
      check_msg_timeout:
      ...
      	ent->timeout -= timeout_period;
      	if (ent->timeout > 0)
      		return;
      ...
      
      The type of timeout_period is long, but ent->timeout is unsigned long.
      This patch makes the type consistent.
      Reported-by: default avatarWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Tested-by: default avatarWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55b06b0f
    • alex chen's avatar
      ocfs2: should wait dio before inode lock in ocfs2_setattr() · 8af77738
      alex chen authored
      commit 28f5a8a7 upstream.
      
      we should wait dio requests to finish before inode lock in
      ocfs2_setattr(), otherwise the following deadlock will happen:
      
      process 1                  process 2                    process 3
      truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
      ocfs2_setattr
       ocfs2_inode_lock_tracker
        ocfs2_inode_lock_full
       inode_dio_wait
        __inode_dio_wait
        -->waiting for all dio
        requests finish
                                                              dlm_proxy_ast_handler
                                                               dlm_do_local_bast
                                                                ocfs2_blocking_ast
                                                                 ocfs2_generic_handle_bast
                                                                  set OCFS2_LOCK_BLOCKED flag
                              dio_end_io
                               dio_bio_end_aio
                                dio_complete
                                 ocfs2_dio_end_io
                                  ocfs2_dio_end_io_write
                                   ocfs2_inode_lock
                                    __ocfs2_cluster_lock
                                     ocfs2_wait_for_mask
                                     -->waiting for OCFS2_LOCK_BLOCKED
                                     flag to be cleared, that is waiting
                                     for 'process 1' unlocking the inode lock
                                 inode_dio_end
                                 -->here dec the i_dio_count, but will never
                                 be called, so a deadlock happened.
      
      Link: http://lkml.kernel.org/r/59F81636.70508@huawei.comSigned-off-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Acked-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8af77738
    • Changwei Ge's avatar
      ocfs2: fix cluster hang after a node dies · a8356445
      Changwei Ge authored
      commit 1c019671 upstream.
      
      When a node dies, other live nodes have to choose a new master for an
      existed lock resource mastered by the dead node.
      
      As for ocfs2/dlm implementation, this is done by function -
      dlm_move_lockres_to_recovery_list which marks those lock rsources as
      DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
      changes lock resource's master later.
      
      So without invoking dlm_move_lockres_to_recovery_list, no master will be
      choosed after dlm recovery accomplishment since no lock resource can be
      found through ::resource list.
      
      What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
      resources mastered a dead node, it will break up synchronization among
      nodes.
      
      So invoke dlm_move_lockres_to_recovery_list again.
      
      Fixs: 'commit ee8f7fcb ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
      Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Reported-by: default avatarVitaly Mayatskih <v.mayatskih@gmail.com>
      Tested-by: default avatarVitaly Mayatskikh <v.mayatskih@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8356445
    • Adam Wallis's avatar
      dmaengine: dmatest: warn user when dma test times out · 2bd38ece
      Adam Wallis authored
      commit a9df21e3 upstream.
      
      Commit adfa543e ("dmatest: don't use set_freezable_with_signal()")
      introduced a bug (that is in fact documented by the patch commit text)
      that leaves behind a dangling pointer. Since the done_wait structure is
      allocated on the stack, future invocations to the DMATEST can produce
      undesirable results (e.g., corrupted spinlocks). Ideally, this would be
      cleaned up in the thread handler, but at the very least, the kernel
      is left in a very precarious scenario that can lead to some long debug
      sessions when the crash comes later.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197605Signed-off-by: default avatarAdam Wallis <awallis@codeaurora.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bd38ece
    • Ji-Ze Hong (Peter Hong)'s avatar
      serial: 8250_fintek: Fix finding base_port with activated SuperIO · e6d4a078
      Ji-Ze Hong (Peter Hong) authored
      commit fd97e66c upstream.
      
      The SuperIO will be configured at boot time by BIOS, but some BIOS
      will not deactivate the SuperIO when the end of configuration. It'll
      lead to mismatch for pdata->base_port in probe_setup_port(). So we'll
      deactivate all SuperIO before activate special base_port in
      fintek_8250_enter_key().
      
      Tested on iBASE MI802.
      Tested-by: default avatarJi-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
      Signed-off-by: default avatarJi-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
      Reviewd-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6d4a078
    • Lukas Wunner's avatar
      serial: omap: Fix EFR write on RTS deassertion · 70eb4608
      Lukas Wunner authored
      commit 2a71de2f upstream.
      
      Commit 348f9bb3 ("serial: omap: Fix RTS handling") sought to enable
      auto RTS upon manual RTS assertion and disable it on deassertion.
      However it seems the latter was done incorrectly, it clears all bits in
      the Extended Features Register *except* auto RTS.
      
      Fixes: 348f9bb3 ("serial: omap: Fix RTS handling")
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70eb4608
    • Roberto Sassu's avatar
      ima: do not update security.ima if appraisal status is not INTEGRITY_PASS · 2cfbb32f
      Roberto Sassu authored
      commit 020aae3e upstream.
      
      Commit b65a9cfc ("Untangling ima mess, part 2: deal with counters")
      moved the call of ima_file_check() from may_open() to do_filp_open() at a
      point where the file descriptor is already opened.
      
      This breaks the assumption made by IMA that file descriptors being closed
      belong to files whose access was granted by ima_file_check(). The
      consequence is that security.ima and security.evm are updated with good
      values, regardless of the current appraisal status.
      
      For example, if a file does not have security.ima, IMA will create it after
      opening the file for writing, even if access is denied. Access to the file
      will be allowed afterwards.
      
      Avoid this issue by checking the appraisal status before updating
      security.ima.
      Signed-off-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cfbb32f
    • Eric Biggers's avatar
      crypto: dh - Fix double free of ctx->p · aa15fe4d
      Eric Biggers authored
      commit 12d41a02 upstream.
      
      When setting the secret with the software Diffie-Hellman implementation,
      if allocating 'g' failed (e.g. if it was longer than
      MAX_EXTERN_MPI_BITS), then 'p' was freed twice: once immediately, and
      once later when the crypto_kpp tfm was destroyed.
      
      Fix it by using dh_free_ctx() (renamed to dh_clear_ctx()) in the error
      paths, as that correctly sets the pointers to NULL.
      
      KASAN report:
      
          MPI: mpi too large (32760 bits)
          ==================================================================
          BUG: KASAN: use-after-free in mpi_free+0x131/0x170
          Read of size 4 at addr ffff88006c7cdf90 by task reproduce_doubl/367
      
          CPU: 1 PID: 367 Comm: reproduce_doubl Not tainted 4.14.0-rc7-00040-g05298abde6fe #7
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           dump_stack+0xb3/0x10b
           ? mpi_free+0x131/0x170
           print_address_description+0x79/0x2a0
           ? mpi_free+0x131/0x170
           kasan_report+0x236/0x340
           ? akcipher_register_instance+0x90/0x90
           __asan_report_load4_noabort+0x14/0x20
           mpi_free+0x131/0x170
           ? akcipher_register_instance+0x90/0x90
           dh_exit_tfm+0x3d/0x140
           crypto_kpp_exit_tfm+0x52/0x70
           crypto_destroy_tfm+0xb3/0x250
           __keyctl_dh_compute+0x640/0xe90
           ? kasan_slab_free+0x12f/0x180
           ? dh_data_from_key+0x240/0x240
           ? key_create_or_update+0x1ee/0xb20
           ? key_instantiate_and_link+0x440/0x440
           ? lock_contended+0xee0/0xee0
           ? kfree+0xcf/0x210
           ? SyS_add_key+0x268/0x340
           keyctl_dh_compute+0xb3/0xf1
           ? __keyctl_dh_compute+0xe90/0xe90
           ? SyS_add_key+0x26d/0x340
           ? entry_SYSCALL_64_fastpath+0x5/0xbe
           ? trace_hardirqs_on_caller+0x3f4/0x560
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
          RIP: 0033:0x43ccf9
          RSP: 002b:00007ffeeec96158 EFLAGS: 00000246 ORIG_RAX: 00000000000000fa
          RAX: ffffffffffffffda RBX: 000000000248b9b9 RCX: 000000000043ccf9
          RDX: 00007ffeeec96170 RSI: 00007ffeeec96160 RDI: 0000000000000017
          RBP: 0000000000000046 R08: 0000000000000000 R09: 0248b9b9143dc936
          R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
          R13: 0000000000409670 R14: 0000000000409700 R15: 0000000000000000
      
          Allocated by task 367:
           save_stack_trace+0x16/0x20
           kasan_kmalloc+0xeb/0x180
           kmem_cache_alloc_trace+0x114/0x300
           mpi_alloc+0x4b/0x230
           mpi_read_raw_data+0xbe/0x360
           dh_set_secret+0x1dc/0x460
           __keyctl_dh_compute+0x623/0xe90
           keyctl_dh_compute+0xb3/0xf1
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 367:
           save_stack_trace+0x16/0x20
           kasan_slab_free+0xab/0x180
           kfree+0xb5/0x210
           mpi_free+0xcb/0x170
           dh_set_secret+0x2d7/0x460
           __keyctl_dh_compute+0x623/0xe90
           keyctl_dh_compute+0xb3/0xf1
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 802c7f1c ("crypto: dh - Add DH software implementation")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa15fe4d
    • Tudor-Dan Ambarus's avatar
      crypto: dh - fix memleak in setkey · 4a7e0231
      Tudor-Dan Ambarus authored
      commit ee34e264 upstream.
      
      setkey can be called multiple times during the existence
      of the transformation object. In case of multiple setkey calls,
      the old key was not freed and we leaked memory.
      Free the old MPI key if any.
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a7e0231
    • Eric W. Biederman's avatar
      net/sctp: Always set scope_id in sctp_inet6_skb_msgname · 67b718fc
      Eric W. Biederman authored
      
      [ Upstream commit 7c8a61d9 ]
      
      Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
      discovered that in some configurations sctp would leak 4 bytes of
      kernel stack.
      
      Working with his reproducer I discovered that those 4 bytes that
      are leaked is the scope id of an ipv6 address returned by recvmsg.
      
      With a little code inspection and a shrewd guess I discovered that
      sctp_inet6_skb_msgname only initializes the scope_id field for link
      local ipv6 addresses to the interface index the link local address
      pertains to instead of initializing the scope_id field for all ipv6
      addresses.
      
      That is almost reasonable as scope_id's are meaniningful only for link
      local addresses.  Set the scope_id in all other cases to 0 which is
      not a valid interface index to make it clear there is nothing useful
      in the scope_id field.
      
      There should be no danger of breaking userspace as the stack leak
      guaranteed that previously meaningless random data was being returned.
      
      Fixes: 372f525b ("SCTP:  Resync with LKSCTP tree.")
      History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitReported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67b718fc
    • Huacai Chen's avatar
      fealnx: Fix building error on MIPS · f0ae7a1b
      Huacai Chen authored
      
      [ Upstream commit cc54c1d3 ]
      
      This patch try to fix the building error on MIPS. The reason is MIPS
      has already defined the LONG macro, which conflicts with the LONG enum
      in drivers/net/ethernet/fealnx.c.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0ae7a1b
    • Xin Long's avatar
      sctp: do not peel off an assoc from one netns to another one · 362d2ce0
      Xin Long authored
      
      [ Upstream commit df80cd9b ]
      
      Now when peeling off an association to the sock in another netns, all
      transports in this assoc are not to be rehashed and keep use the old
      key in hashtable.
      
      As a transport uses sk->net as the hash key to insert into hashtable,
      it would miss removing these transports from hashtable due to the new
      netns when closing the sock and all transports are being freeed, then
      later an use-after-free issue could be caused when looking up an asoc
      and dereferencing those transports.
      
      This is a very old issue since very beginning, ChunYu found it with
      syzkaller fuzz testing with this series:
      
        socket$inet6_sctp()
        bind$inet6()
        sendto$inet6()
        unshare(0x40000000)
        getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
        getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
      
      This patch is to block this call when peeling one assoc off from one
      netns to another one, so that the netns of all transport would not
      go out-sync with the key in hashtable.
      
      Note that this patch didn't fix it by rehashing transports, as it's
      difficult to handle the situation when the tuple is already in use
      in the new netns. Besides, no one would like to peel off one assoc
      to another netns, considering ipaddrs, ifaces, etc. are usually
      different.
      Reported-by: default avatarChunYu Wang <chunwang@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      362d2ce0
    • Jason A. Donenfeld's avatar
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 99aa74ce
      Jason A. Donenfeld authored
      
      [ Upstream commit 0642840b ]
      
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99aa74ce
    • Cong Wang's avatar
      vlan: fix a use-after-free in vlan_device_event() · 080ecd2b
      Cong Wang authored
      
      [ Upstream commit 052d41c0 ]
      
      After refcnt reaches zero, vlan_vid_del() could free
      dev->vlan_info via RCU:
      
      	RCU_INIT_POINTER(dev->vlan_info, NULL);
      	call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
      
      However, the pointer 'grp' still points to that memory
      since it is set before vlan_vid_del():
      
              vlan_info = rtnl_dereference(dev->vlan_info);
              if (!vlan_info)
                      goto out;
              grp = &vlan_info->grp;
      
      Depends on when that RCU callback is scheduled, we could
      trigger a use-after-free in vlan_group_for_each_dev()
      right following this vlan_vid_del().
      
      Fix it by moving vlan_vid_del() before setting grp. This
      is also symmetric to the vlan_vid_add() we call in
      vlan_device_event().
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Fixes: efc73f4b ("net: Fix memory leak - vlan_info struct")
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      080ecd2b
    • Andrey Konovalov's avatar
      net: usb: asix: fill null-ptr-deref in asix_suspend · 58baa36d
      Andrey Konovalov authored
      
      [ Upstream commit 8f562462 ]
      
      When asix_suspend() is called dev->driver_priv might not have been
      assigned a value, so we need to check that it's not NULL.
      
      Similar issue is present in asix_resume(), this patch fixes it as well.
      
      Found by syzkaller.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      Modules linked in:
      CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc4-43422-geccacdd69a8c #400
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      task: ffff88006bb36300 task.stack: ffff88006bba8000
      RIP: 0010:asix_suspend+0x76/0xc0 drivers/net/usb/asix_devices.c:629
      RSP: 0018:ffff88006bbae718 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff880061ba3b80 RCX: 1ffff1000c34d644
      RDX: 0000000000000001 RSI: 0000000000000402 RDI: 0000000000000008
      RBP: ffff88006bbae738 R08: 1ffff1000d775cad R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800630a8b40
      R13: 0000000000000000 R14: 0000000000000402 R15: ffff880061ba3b80
      FS:  0000000000000000(0000) GS:ffff88006c600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ff33cf89000 CR3: 0000000061c0a000 CR4: 00000000000006f0
      Call Trace:
       usb_suspend_interface drivers/usb/core/driver.c:1209
       usb_suspend_both+0x27f/0x7e0 drivers/usb/core/driver.c:1314
       usb_runtime_suspend+0x41/0x120 drivers/usb/core/driver.c:1852
       __rpm_callback+0x339/0xb60 drivers/base/power/runtime.c:334
       rpm_callback+0x106/0x220 drivers/base/power/runtime.c:461
       rpm_suspend+0x465/0x1980 drivers/base/power/runtime.c:596
       __pm_runtime_suspend+0x11e/0x230 drivers/base/power/runtime.c:1009
       pm_runtime_put_sync_autosuspend ./include/linux/pm_runtime.h:251
       usb_new_device+0xa37/0x1020 drivers/usb/core/hub.c:2487
       hub_port_connect drivers/usb/core/hub.c:4903
       hub_port_connect_change drivers/usb/core/hub.c:5009
       port_event drivers/usb/core/hub.c:5115
       hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
       process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
       worker_thread+0x221/0x1850 kernel/workqueue.c:2253
       kthread+0x3a1/0x470 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      Code: 8d 7c 24 20 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5b 48 b8 00 00
      00 00 00 fc ff df 4d 8b 6c 24 20 49 8d 7d 08 48 89 fa 48 c1 ea 03 <80>
      3c 02 00 75 34 4d 8b 6d 08 4d 85 ed 74 0b e8 26 2b 51 fd 4c
      RIP: asix_suspend+0x76/0xc0 RSP: ffff88006bbae718
      ---[ end trace dfc4f5649284342c ]---
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58baa36d
    • Kristian Evensen's avatar
      qmi_wwan: Add missing skb_reset_mac_header-call · 4ad82095
      Kristian Evensen authored
      
      [ Upstream commit 0de0add1 ]
      
      When we receive a packet on a QMI device in raw IP mode, we should call
      skb_reset_mac_header() to ensure that skb->mac_header contains a valid
      offset in the packet. While it shouldn't really matter, the packets have
      no MAC header and the interface is configured as-such, it seems certain
      parts of the network stack expects a "good" value in skb->mac_header.
      
      Without the skb_reset_mac_header() call added in this patch, for example
      shaping traffic (using tc) triggers the following oops on the first
      received packet:
      
      [  303.642957] skbuff: skb_under_panic: text:8f137918 len:177 put:67 head:8e4b0f00 data:8e4b0eff tail:0x8e4b0fb0 end:0x8e4b1520 dev:wwan0
      [  303.655045] Kernel bug detected[#1]:
      [  303.658622] CPU: 1 PID: 1002 Comm: logd Not tainted 4.9.58 #0
      [  303.664339] task: 8fdf05e0 task.stack: 8f15c000
      [  303.668844] $ 0   : 00000000 00000001 0000007a 00000000
      [  303.674062] $ 4   : 8149a2fc 8149a2fc 8149ce20 00000000
      [  303.679284] $ 8   : 00000030 3878303a 31623465 20303235
      [  303.684510] $12   : ded731e3 2626a277 00000000 03bd0000
      [  303.689747] $16   : 8ef62b40 00000043 8f137918 804db5fc
      [  303.694978] $20   : 00000001 00000004 8fc13800 00000003
      [  303.700215] $24   : 00000001 8024ab10
      [  303.705442] $28   : 8f15c000 8fc19cf0 00000043 802cc920
      [  303.710664] Hi    : 00000000
      [  303.713533] Lo    : 74e58000
      [  303.716436] epc   : 802cc920 skb_panic+0x58/0x5c
      [  303.721046] ra    : 802cc920 skb_panic+0x58/0x5c
      [  303.725639] Status: 11007c03 KERNEL EXL IE
      [  303.729823] Cause : 50800024 (ExcCode 09)
      [  303.733817] PrId  : 0001992f (MIPS 1004Kc)
      [  303.737892] Modules linked in: rt2800pci rt2800mmio rt2800lib qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 mt76x2i
      Process logd (pid: 1002, threadinfo=8f15c000, task=8fdf05e0, tls=77b3eee4)
      [  303.962509] Stack : 00000000 80408990 8f137918 000000b1 00000043 8e4b0f00 8e4b0eff 8e4b0fb0
      [  303.970871]         8e4b1520 8fec1800 00000043 802cd2a4 6e000045 00000043 00000000 8ef62000
      [  303.979219]         8eef5d00 8ef62b40 8fea7300 8f137918 00000000 00000000 0002bb01 793e5664
      [  303.987568]         8ef08884 00000001 8fea7300 00000002 8fc19e80 8eef5d00 00000006 00000003
      [  303.995934]         00000000 8030ba90 00000003 77ab3fd0 8149dc80 8004d1bc 8f15c000 8f383700
      [  304.004324]         ...
      [  304.006767] Call Trace:
      [  304.009241] [<802cc920>] skb_panic+0x58/0x5c
      [  304.013504] [<802cd2a4>] skb_push+0x78/0x90
      [  304.017783] [<8f137918>] 0x8f137918
      [  304.021269] Code: 00602825  0c02a3b4  24842888 <000c000d> 8c870060  8c8200a0  0007382b  00070336  8c88005c
      [  304.031034]
      [  304.032805] ---[ end trace b778c482b3f0bda9 ]---
      [  304.041384] Kernel panic - not syncing: Fatal exception in interrupt
      [  304.051975] Rebooting in 3 seconds..
      
      While the oops is for a 4.9-kernel, I was able to trigger the same oops with
      net-next as of yesterday.
      
      Fixes: 32f7adf6 ("net: qmi_wwan: support "raw IP" mode")
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ad82095
    • Bjørn Mork's avatar
      net: qmi_wwan: fix divide by 0 on bad descriptors · 02a0c063
      Bjørn Mork authored
      
      [ Upstream commit 7fd07833 ]
      
      A CDC Ethernet functional descriptor with wMaxSegmentSize = 0 will
      cause a divide error in usbnet_probe:
      
      divide error: 0000 [#1] PREEMPT SMP KASAN
      Modules linked in:
      CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc8-44453-g1fdc1a82c34f #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      task: ffff88006bef5c00 task.stack: ffff88006bf60000
      RIP: 0010:usbnet_update_max_qlen+0x24d/0x390 drivers/net/usb/usbnet.c:355
      RSP: 0018:ffff88006bf67508 EFLAGS: 00010246
      RAX: 00000000000163c8 RBX: ffff8800621fce40 RCX: ffff8800621fcf34
      RDX: 0000000000000000 RSI: ffffffff837ecb7a RDI: ffff8800621fcf34
      RBP: ffff88006bf67520 R08: ffff88006bef5c00 R09: ffffed000c43f881
      R10: ffffed000c43f880 R11: ffff8800621fc406 R12: 0000000000000003
      R13: ffffffff85c71de0 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffe9c0d6dac CR3: 00000000614f4000 CR4: 00000000000006f0
      Call Trace:
       usbnet_probe+0x18b5/0x2790 drivers/net/usb/usbnet.c:1783
       qmi_wwan_probe+0x133/0x220 drivers/net/usb/qmi_wwan.c:1338
       usb_probe_interface+0x324/0x940 drivers/usb/core/driver.c:361
       really_probe drivers/base/dd.c:413
       driver_probe_device+0x522/0x740 drivers/base/dd.c:557
      
      Fix by simply ignoring the bogus descriptor, as it is optional
      for QMI devices anyway.
      
      Fixes: 423ce8ca ("net: usb: qmi_wwan: New driver for Huawei QMI based WWAN devices")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02a0c063
    • Bjørn Mork's avatar
      net: cdc_ether: fix divide by 0 on bad descriptors · f3766218
      Bjørn Mork authored
      
      [ Upstream commit 2cb80187 ]
      
      Setting dev->hard_mtu to 0 will cause a divide error in
      usbnet_probe. Protect against devices with bogus CDC Ethernet
      functional descriptors by ignoring a zero wMaxSegmentSize.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3766218
    • Hangbin Liu's avatar
      bonding: discard lowest hash bit for 802.3ad layer3+4 · 6f239c06
      Hangbin Liu authored
      
      [ Upstream commit b5f86218 ]
      
      After commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
      in connect()"), we will try to use even ports for connect(). Then if an
      application (seen clearly with iperf) opens multiple streams to the same
      destination IP and port, each stream will be given an even source port.
      
      So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
      will always hash all these streams to the same interface. And the total
      throughput will limited to a single slave.
      
      Change the tcp code will impact the whole tcp behavior, only for bonding
      usage. Paolo Abeni suggested fix this by changing the bonding code only,
      which should be more reasonable, and less impact.
      
      Fix this by discarding the lowest hash bit because it contains little entropy.
      After the fix we can re-balance between slaves.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f239c06
    • Ye Yin's avatar
      netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed · afd9fa66
      Ye Yin authored
      
      [ Upstream commit 2b5ec1a5 ]
      
      When run ipvs in two different network namespace at the same host, and one
      ipvs transport network traffic to the other network namespace ipvs.
      'ipvs_property' flag will make the second ipvs take no effect. So we should
      clear 'ipvs_property' when SKB network namespace changed.
      
      Fixes: 621e84d6 ("dev: introduce skb_scrub_packet()")
      Signed-off-by: default avatarYe Yin <hustcat@gmail.com>
      Signed-off-by: default avatarWei Zhou <chouryzhou@gmail.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afd9fa66
    • Eric Dumazet's avatar
      tcp: do not mangle skb->cb[] in tcp_make_synack() · 3920a5bd
      Eric Dumazet authored
      
      [ Upstream commit 3b117750 ]
      
      Christoph Paasch sent a patch to address the following issue :
      
      tcp_make_synack() is leaving some TCP private info in skb->cb[],
      then send the packet by other means than tcp_transmit_skb()
      
      tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
      IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
      
      tcp_make_synack() should not use tcp_init_nondata_skb() :
      
      tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
      queues (the ones that are only sent via tcp_transmit_skb())
      
      This patch fixes the issue and should even save few cpu cycles ;)
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3920a5bd
    • Jeff Barnhill's avatar
      net: vrf: correct FRA_L3MDEV encode type · 58b21b02
      Jeff Barnhill authored
      
      [ Upstream commit 18129a24 ]
      
      FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On
      big endian architecture, this results in the l3mdev entry not being
      added to the FIB rules.
      
      Fixes: 1aa6c4f6 ("net: vrf: Add l3mdev rules on first device create")
      Signed-off-by: default avatarJeff Barnhill <0xeffeff@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b21b02
    • Konstantin Khlebnikov's avatar
      tcp_nv: fix division by zero in tcpnv_acked() · b0e50c4e
      Konstantin Khlebnikov authored
      
      [ Upstream commit 4eebff27 ]
      
      Average RTT could become zero. This happened in real life at least twice.
      This patch treats zero as 1us.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarLawrence Brakmo <Brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0e50c4e
  3. 21 Nov, 2017 3 commits