1. 01 Feb, 2020 1 commit
    • Thomas Gleixner's avatar
      x86/apic/msi: Plug non-maskable MSI affinity race · 6f1a4891
      Thomas Gleixner authored
      Evan tracked down a subtle race between the update of the MSI message and
      the device raising an interrupt internally on PCI devices which do not
      support MSI masking. The update of the MSI message is non-atomic and
      consists of either 2 or 3 sequential 32bit wide writes to the PCI config
      space.
      
         - Write address low 32bits
         - Write address high 32bits (If supported by device)
         - Write data
      
      When an interrupt is migrated then both address and data might change, so
      the kernel attempts to mask the MSI interrupt first. But for MSI masking is
      optional, so there exist devices which do not provide it. That means that
      if the device raises an interrupt internally between the writes then a MSI
      message is sent built from half updated state.
      
      On x86 this can lead to spurious interrupts on the wrong interrupt
      vector when the affinity setting changes both address and data. As a
      consequence the device interrupt can be lost causing the device to
      become stuck or malfunctioning.
      
      Evan tried to handle that by disabling MSI accross an MSI message
      update. That's not feasible because disabling MSI has issues on its own:
      
       If MSI is disabled the PCI device is routing an interrupt to the legacy
       INTx mechanism. The INTx delivery can be disabled, but the disablement is
       not working on all devices.
      
       Some devices lose interrupts when both MSI and INTx delivery are disabled.
      
      Another way to solve this would be to enforce the allocation of the same
      vector on all CPUs in the system for this kind of screwed devices. That
      could be done, but it would bring back the vector space exhaustion problems
      which got solved a few years ago.
      
      Fortunately the high address (if supported by the device) is only relevant
      when X2APIC is enabled which implies interrupt remapping. In the interrupt
      remapping case the affinity setting is happening at the interrupt remapping
      unit and the PCI MSI message is programmed only once when the PCI device is
      initialized.
      
      That makes it possible to solve it with a two step update:
      
        1) Target the MSI msg to the new vector on the current target CPU
      
        2) Target the MSI msg to the new vector on the new target CPU
      
      In both cases writing the MSI message is only changing a single 32bit word
      which prevents the issue of inconsistency.
      
      After writing the final destination it is necessary to check whether the
      device issued an interrupt while the intermediate state #1 (new vector,
      current CPU) was in effect.
      
      This is possible because the affinity change is always happening on the
      current target CPU. The code runs with interrupts disabled, so the
      interrupt can be detected by checking the IRR of the local APIC. If the
      vector is pending in the IRR then the interrupt is retriggered on the new
      target CPU by sending an IPI for the associated vector on the target CPU.
      
      This can cause spurious interrupts on both the local and the new target
      CPU.
      
       1) If the new vector is not in use on the local CPU and the device
          affected by the affinity change raised an interrupt during the
          transitional state (step #1 above) then interrupt entry code will
          ignore that spurious interrupt. The vector is marked so that the
          'No irq handler for vector' warning is supressed once.
      
       2) If the new vector is in use already on the local CPU then the IRR check
          might see an pending interrupt from the device which is using this
          vector. The IPI to the new target CPU will then invoke the handler of
          the device, which got the affinity change, even if that device did not
          issue an interrupt
      
       3) If the new vector is in use already on the local CPU and the device
          affected by the affinity change raised an interrupt during the
          transitional state (step #1 above) then the handler of the device which
          uses that vector on the local CPU will be invoked.
      
      expose issues in device driver interrupt handlers which are not prepared to
      handle a spurious interrupt correctly. This not a regression, it's just
      exposing something which was already broken as spurious interrupts can
      happen for a lot of reasons and all driver handlers need to be able to deal
      with them.
      Reported-by: default avatarEvan Green <evgreen@chromium.org>
      Debugged-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarEvan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
      6f1a4891
  2. 31 Jan, 2020 1 commit
  3. 29 Jan, 2020 1 commit
  4. 28 Jan, 2020 1 commit
  5. 22 Jan, 2020 1 commit
    • Masami Hiramatsu's avatar
      x86/decoder: Add TEST opcode to Group3-2 · 8b7e20a7
      Masami Hiramatsu authored
      Add TEST opcode to Group3-2 reg=001b as same as Group3-1 does.
      
      Commit
      
        12a78d43 ("x86/decoder: Add new TEST instruction pattern")
      
      added a TEST opcode assignment to f6 XX/001/XXX (Group 3-1), but did
      not add f7 XX/001/XXX (Group 3-2).
      
      Actually, this TEST opcode variant (ModRM.reg /1) is not described in
      the Intel SDM Vol2 but in AMD64 Architecture Programmer's Manual Vol.3,
      Appendix A.2 Table A-6. ModRM.reg Extensions for the Primary Opcode Map.
      
      Without this fix, Randy found a warning by insn_decoder_test related
      to this issue as below.
      
          HOSTCC  arch/x86/tools/insn_decoder_test
          HOSTCC  arch/x86/tools/insn_sanity
          TEST    posttest
        arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
        arch/x86/tools/insn_decoder_test: warning: ffffffff81000bf1:	f7 0b 00 01 08 00    	testl  $0x80100,(%rbx)
        arch/x86/tools/insn_decoder_test: warning: objdump says 6 bytes, but insn_get_length() says 2
        arch/x86/tools/insn_decoder_test: warning: Decoded and checked 11913894 instructions with 1 failures
          TEST    posttest
        arch/x86/tools/insn_sanity: Success: decoded and checked 1000000 random instructions with 0 errors (seed:0x871ce29c)
      
      To fix this error, add the TEST opcode according to AMD64 APM Vol.3.
      
       [ bp: Massage commit message. ]
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/157966631413.9580.10311036595431878351.stgit@devnote2
      8b7e20a7
  6. 20 Jan, 2020 5 commits
    • Xiaochen Shen's avatar
      x86/resctrl: Clean up unused function parameter in mkdir path · 32ada3b9
      Xiaochen Shen authored
      Commit
      
        334b0f4e ("x86/resctrl: Fix a deadlock due to inaccurate reference")
      
      changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
      within mkdir_rdt_prepare(). That change resulted in an unused function
      parameter to mkdir_rdt_prepare().
      
      Clean up the unused function parameter in mkdir_rdt_prepare() and its
      callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon().
      Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com
      32ada3b9
    • Xiaochen Shen's avatar
      x86/resctrl: Fix a deadlock due to inaccurate reference · 334b0f4e
      Xiaochen Shen authored
      There is a race condition which results in a deadlock when rmdir and
      mkdir execute concurrently:
      
      $ ls /sys/fs/resctrl/c1/mon_groups/m1/
      cpus  cpus_list  mon_data  tasks
      
      Thread 1: rmdir /sys/fs/resctrl/c1
      Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1
      
      3 locks held by mkdir/48649:
       #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
       #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
       #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
      
      4 locks held by rmdir/48652:
       #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
       #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
       #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
       #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
      
      Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
      kernfs_remove() removes all kernfs nodes under directory "c1"
      recursively, then waits for sub kernfs node "mon_groups" to drop active
      reference.
      
      Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
      directory. The wrapper kernfs_iop_mkdir() takes an active reference to
      the "mon_groups" directory but the code drops the active reference to
      the parent directory "c1" instead.
      
      As a result, Thread 1 is blocked on waiting for active reference to drop
      and never release rdtgroup_mutex, while Thread 2 is also blocked on
      trying to get rdtgroup_mutex.
      
      Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
      (rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
      -------------------------   -------------------------
                                  kernfs_iop_mkdir
                                    /*
                                     * kn: "m1", parent_kn: "mon_groups",
                                     * prgrp_kn: parent_kn->parent: "c1",
                                     *
                                     * "mon_groups", parent_kn->active++: 1
                                     */
                                    kernfs_get_active(parent_kn)
      kernfs_iop_rmdir
        /* "c1", kn->active++ */
        kernfs_get_active(kn)
      
        rdtgroup_kn_lock_live
          atomic_inc(&rdtgrp->waitcount)
          /* "c1", kn->active-- */
          kernfs_break_active_protection(kn)
          mutex_lock
      
        rdtgroup_rmdir_ctrl
          free_all_child_rdtgrp
            sentry->flags = RDT_DELETED
      
          rdtgroup_ctrl_remove
            rdtgrp->flags = RDT_DELETED
            kernfs_get(kn)
            kernfs_remove(rdtgrp->kn)
              __kernfs_remove
                /* "mon_groups", sub_kn */
                atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
                kernfs_drain(sub_kn)
                  /*
                   * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
                   * waiting on sub_kn->active to drop, but it
                   * never drops in Thread 2 which is blocked
                   * on getting rdtgroup_mutex.
                   */
      Thread 1 hangs here ---->
                  wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
                  ...
                                    rdtgroup_mkdir
                                      rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                        mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                          rdtgroup_kn_lock_live(prgrp_kn)
                                            atomic_inc(&rdtgrp->waitcount)
                                            /*
                                             * "c1", prgrp_kn->active--
                                             *
                                             * The active reference on "c1" is
                                             * dropped, but not matching the
                                             * actual active reference taken
                                             * on "mon_groups", thus causing
                                             * Thread 1 to wait forever while
                                             * holding rdtgroup_mutex.
                                             */
                                            kernfs_break_active_protection(
                                                                     prgrp_kn)
                                            /*
                                             * Trying to get rdtgroup_mutex
                                             * which is held by Thread 1.
                                             */
      Thread 2 hangs here ---->             mutex_lock
                                            ...
      
      The problem is that the creation of a subdirectory in the "mon_groups"
      directory incorrectly releases the active protection of its parent
      directory instead of itself before it starts waiting for rdtgroup_mutex.
      This is triggered by the rdtgroup_mkdir() flow calling
      rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
      parent control group ("c1") as argument. It should be called with kernfs
      node "mon_groups" instead. What is currently missing is that the
      kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.
      
      Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
      it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
      instead. And then it operates on the same rdtgroup structure but handles
      the active reference of kernfs node "mon_groups" to prevent deadlock.
      The same changes are also made to the "mon_data" directories.
      
      This results in some unused function parameters that will be cleaned up
      in follow-up patch as the focus here is on the fix only in support of
      backporting efforts.
      
      Fixes: c7d9aac6 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
      Suggested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
      334b0f4e
    • Xiaochen Shen's avatar
      x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup · 074fadee
      Xiaochen Shen authored
      There is a race condition in the following scenario which results in an
      use-after-free issue when reading a monitoring file and deleting the
      parent ctrl_mon group concurrently:
      
      Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls
      kernfs_break_active_protection() to drop the active reference of kernfs
      node in rdtgroup_kn_lock_live().
      
      In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub
      kernfs nodes to drop the active reference when removing all subtree
      kernfs nodes recursively. Thread 2 could block on kernfs_remove() until
      Thread 1 calls kernfs_break_active_protection(). Only after
      kernfs_remove() completes the refcount of rdtgrp could be trusted.
      
      Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(),
      Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0
      instead of 1 due to the race.
      
      In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory
      (rdtgrp->waitcount) which was already freed in Thread 2 results in
      use-after-free issue.
      
      Thread 1 (rdtgroup_mondata_show)  Thread 2 (rdtgroup_rmdir)
      --------------------------------  -------------------------
      rdtgroup_kn_lock_live
        /*
         * kn active protection until
         * kernfs_break_active_protection(kn)
         */
        rdtgrp = kernfs_to_rdtgroup(kn)
                                        rdtgroup_kn_lock_live
                                          atomic_inc(&rdtgrp->waitcount)
                                          mutex_lock
                                        rdtgroup_rmdir_ctrl
                                          free_all_child_rdtgrp
                                            /*
                                             * sentry->waitcount should be 1
                                             * but is 0 now due to the race.
                                             */
                                            kfree(sentry)*[1]
        /*
         * Only after kernfs_remove()
         * completes, the refcount of
         * rdtgrp could be trusted.
         */
        atomic_inc(&rdtgrp->waitcount)
        /* kn->active-- */
        kernfs_break_active_protection(kn)
                                          rdtgroup_ctrl_remove
                                            rdtgrp->flags = RDT_DELETED
                                            /*
                                             * Blocking routine, wait for
                                             * all sub kernfs nodes to drop
                                             * active reference in
                                             * kernfs_break_active_protection.
                                             */
                                            kernfs_remove(rdtgrp->kn)
                                        rdtgroup_kn_unlock
                                          mutex_unlock
                                          atomic_dec_and_test(
                                                      &rdtgrp->waitcount)
                                          && (flags & RDT_DELETED)
                                            kernfs_unbreak_active_protection(kn)
                                            kfree(rdtgrp)
        mutex_lock
      mon_event_read
      rdtgroup_kn_unlock
        mutex_unlock
        /*
         * Use-after-free: refer to earlier rdtgrp
         * memory which was freed in [1].
         */
        atomic_dec_and_test(&rdtgrp->waitcount)
        && (flags & RDT_DELETED)
          /* kn->active++ */
          kernfs_unbreak_active_protection(kn)
          kfree(rdtgrp)
      
      Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in
      rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp.
      
      Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support")
      Suggested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com
      074fadee
    • Xiaochen Shen's avatar
      x86/resctrl: Fix use-after-free when deleting resource groups · b8511ccc
      Xiaochen Shen authored
      A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount)
      that indicates how many waiters expect this rdtgrp to exist. Waiters
      could be waiting on rdtgroup_mutex or some work sitting on a task's
      workqueue for when the task returns from kernel mode or exits.
      
      The deletion of a rdtgrp is intended to have two phases:
      
        (1) while holding rdtgroup_mutex the necessary cleanup is done and
        rdtgrp->flags is set to RDT_DELETED,
      
        (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
        only if there are no waiters and its flag is set to RDT_DELETED. Upon
        gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
        for the RDT_DELETED flag.
      
      When unmounting the resctrl file system or deleting ctrl_mon groups,
      all of the subdirectories are removed and the data structure of rdtgrp
      is forcibly freed without checking rdtgrp->waitcount. If at this point
      there was a waiter on rdtgrp then a use-after-free issue occurs when the
      waiter starts running and accesses the rdtgrp structure it was waiting
      on.
      
      See kfree() calls in [1], [2] and [3] in these two call paths in
      following scenarios:
      (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp()
      (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()
      
      There are several scenarios that result in use-after-free issue in
      following:
      
      Scenario 1:
      -----------
      In Thread 1, rdtgroup_tasks_write() adds a task_work callback
      move_myself(). If move_myself() is scheduled to execute after Thread 2
      rdt_kill_sb() is finished, referring to earlier rdtgrp memory
      (rdtgrp->waitcount) which was already freed in Thread 2 results in
      use-after-free issue.
      
      Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdt_kill_sb)
      -------------------------------        ----------------------
      rdtgroup_kn_lock_live
        atomic_inc(&rdtgrp->waitcount)
        mutex_lock
      rdtgroup_move_task
        __rdtgroup_move_task
          /*
           * Take an extra refcount, so rdtgrp cannot be freed
           * before the call back move_myself has been invoked
           */
          atomic_inc(&rdtgrp->waitcount)
          /* Callback move_myself will be scheduled for later */
          task_work_add(move_myself)
      rdtgroup_kn_unlock
        mutex_unlock
        atomic_dec_and_test(&rdtgrp->waitcount)
        && (flags & RDT_DELETED)
                                             mutex_lock
                                             rmdir_all_sub
                                               /*
                                                * sentry and rdtgrp are freed
                                                * without checking refcount
                                                */
                                               free_all_child_rdtgrp
                                                 kfree(sentry)*[1]
                                               kfree(rdtgrp)*[2]
                                             mutex_unlock
      /*
       * Callback is scheduled to execute
       * after rdt_kill_sb is finished
       */
      move_myself
        /*
         * Use-after-free: refer to earlier rdtgrp
         * memory which was freed in [1] or [2].
         */
        atomic_dec_and_test(&rdtgrp->waitcount)
        && (flags & RDT_DELETED)
          kfree(rdtgrp)
      
      Scenario 2:
      -----------
      In Thread 1, rdtgroup_tasks_write() adds a task_work callback
      move_myself(). If move_myself() is scheduled to execute after Thread 2
      rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
      (rdtgrp->waitcount) which was already freed in Thread 2 results in
      use-after-free issue.
      
      Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdtgroup_rmdir)
      -------------------------------        -------------------------
      rdtgroup_kn_lock_live
        atomic_inc(&rdtgrp->waitcount)
        mutex_lock
      rdtgroup_move_task
        __rdtgroup_move_task
          /*
           * Take an extra refcount, so rdtgrp cannot be freed
           * before the call back move_myself has been invoked
           */
          atomic_inc(&rdtgrp->waitcount)
          /* Callback move_myself will be scheduled for later */
          task_work_add(move_myself)
      rdtgroup_kn_unlock
        mutex_unlock
        atomic_dec_and_test(&rdtgrp->waitcount)
        && (flags & RDT_DELETED)
                                             rdtgroup_kn_lock_live
                                               atomic_inc(&rdtgrp->waitcount)
                                               mutex_lock
                                             rdtgroup_rmdir_ctrl
                                               free_all_child_rdtgrp
                                                 /*
                                                  * sentry is freed without
                                                  * checking refcount
                                                  */
                                                 kfree(sentry)*[3]
                                               rdtgroup_ctrl_remove
                                                 rdtgrp->flags = RDT_DELETED
                                             rdtgroup_kn_unlock
                                               mutex_unlock
                                               atomic_dec_and_test(
                                                           &rdtgrp->waitcount)
                                               && (flags & RDT_DELETED)
                                                 kfree(rdtgrp)
      /*
       * Callback is scheduled to execute
       * after rdt_kill_sb is finished
       */
      move_myself
        /*
         * Use-after-free: refer to earlier rdtgrp
         * memory which was freed in [3].
         */
        atomic_dec_and_test(&rdtgrp->waitcount)
        && (flags & RDT_DELETED)
          kfree(rdtgrp)
      
      If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
      like following. Note that "0x6b" is POISON_FREE after kfree(). The
      corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
      waitcount member of struct rdtgroup which was freed:
      
        Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
        420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
        Single bit error detected. Probably bad RAM.
        Run memtest86+ or a similar memory test tool.
        Next obj: start=ffff9504c5b0d800, len=2048
        000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      
        Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048
        420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkdkkkkkkkkkkk
        Prev obj: start=ffff9504c58ab000, len=2048
        000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      
      Fix this by taking reference count (waitcount) of rdtgrp into account in
      the two call paths that currently do not do so. Instead of always
      freeing the resource group it will only be freed if there are no waiters
      on it. If there are waiters, the resource group will have its flags set
      to RDT_DELETED.
      
      It will be left to the waiter to free the resource group when it starts
      running and finding that it was the last waiter and the resource group
      has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb()
      -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() ->
      rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()
      
      Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support")
      Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system")
      Suggested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
      b8511ccc
    • Linus Torvalds's avatar
      Linux 5.5-rc7 · def9d278
      Linus Torvalds authored
      def9d278
  7. 19 Jan, 2020 8 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 7008ee12
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Three fixes for RISC-V:
      
         - Don't free and reuse memory containing the code that CPUs parked at
           boot reside in.
      
         - Fix rv64 build problems for ubsan and some modules by adding
           logical and arithmetic shift helpers for 128-bit values. These are
           from libgcc and are similar to what's present for ARM64.
      
         - Fix vDSO builds to clean up their own temporary files"
      
      * tag 'riscv/for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Less inefficient gcc tishift helpers (and export their symbols)
        riscv: delete temporary files
        riscv: make sure the cores stay looping in .Lsecondary_park
      7008ee12
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 11a82729
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix non-blocking connect() in x25, from Martin Schiller.
      
       2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.
      
       3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.
      
       4) Limit size of TSO packets properly in lan78xx driver, from Eric
          Dumazet.
      
       5) r8152 probe needs an endpoint sanity check, from Johan Hovold.
      
       6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
          John Fastabend.
      
       7) hns3 needs short frames padded on transmit, from Yunsheng Lin.
      
       8) Fix netfilter ICMP header corruption, from Eyal Birger.
      
       9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.
      
      10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.
      
      11) Fix memory leak in act_ctinfo, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        cxgb4: reject overlapped queues in TC-MQPRIO offload
        cxgb4: fix Tx multi channel port rate limit
        net: sched: act_ctinfo: fix memory leak
        bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
        bnxt_en: Fix ipv6 RFS filter matching logic.
        bnxt_en: Fix NTUPLE firmware command failures.
        net: systemport: Fixed queue mapping in internal ring map
        net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
        net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
        net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
        net: hns: fix soft lockup when there is not enough memory
        net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
        net/sched: act_ife: initalize ife->metalist earlier
        netfilter: nat: fix ICMP header corruption on ICMP errors
        net: wan: lapbether.c: Use built-in RCU list checking
        netfilter: nf_tables: fix flowtable list del corruption
        netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
        netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
        netfilter: nft_tunnel: ERSPAN_VERSION must not be null
        netfilter: nft_tunnel: fix null-attribute check
        ...
      11a82729
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5f436443
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Two runtime PM fixes and one leak fix"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: iop3xx: Fix memory leak in probe error path
        i2c: tegra: Properly disable runtime PM on driver's probe error
        i2c: tegra: Fix suspending in active runtime PM state
      5f436443
    • Rahul Lakkireddy's avatar
      cxgb4: reject overlapped queues in TC-MQPRIO offload · b2383ad9
      Rahul Lakkireddy authored
      A queue can't belong to multiple traffic classes. So, reject
      any such configuration that results in overlapped queues for a
      traffic class.
      
      Fixes: b1396c2b ("cxgb4: parse and configure TC-MQPRIO offload")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2383ad9
    • Rahul Lakkireddy's avatar
      cxgb4: fix Tx multi channel port rate limit · c856e2b6
      Rahul Lakkireddy authored
      T6 can support 2 egress traffic management channels per port to
      double the total number of traffic classes that can be configured.
      In this configuration, if the class belongs to the other channel,
      then all the queues must be bound again explicitly to the new class,
      for the rate limit parameters on the other channel to take effect.
      
      So, always explicitly bind all queues to the port rate limit traffic
      class, regardless of the traffic management channel that it belongs
      to. Also, only bind queues to port rate limit traffic class, if all
      the queues don't already belong to an existing different traffic
      class.
      
      Fixes: 4ec4762d ("cxgb4: add TC-MATCHALL classifier egress offload")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c856e2b6
    • Eric Dumazet's avatar
      net: sched: act_ctinfo: fix memory leak · 09d4f10a
      Eric Dumazet authored
      Implement a cleanup method to properly free ci->params
      
      BUG: memory leak
      unreferenced object 0xffff88811746e2c0 (size 64):
        comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00  .4`.............
        backtrace:
          [<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline]
          [<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline]
          [<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline]
          [<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236
          [<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944
          [<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000
          [<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410
          [<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465
          [<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixes: 24ec483c ("net: sched: Introduce act_ctinfo action")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarKevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09d4f10a
    • Olof Johansson's avatar
      riscv: Less inefficient gcc tishift helpers (and export their symbols) · fc585d4a
      Olof Johansson authored
      The existing __lshrti3 was really inefficient, and the other two helpers
      are also needed to compile some modules.
      
      Add the missing versions, and export all of the symbols like arm64
      already does.
      
      This code is based on the assembly generated by libgcc builds.
      
      This fixes a build break triggered by ubsan:
      
      riscv64-unknown-linux-gnu-ld: lib/ubsan.o: in function `.L2':
      ubsan.c:(.text.unlikely+0x38): undefined reference to `__ashlti3'
      riscv64-unknown-linux-gnu-ld: ubsan.c:(.text.unlikely+0x42): undefined reference to `__ashrti3'
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      [paul.walmsley@sifive.com: use SYM_FUNC_{START,END} instead of
       ENTRY/ENDPROC; note libgcc origin]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      fc585d4a
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 8f8972a3
      Linus Torvalds authored
      Pull MTD fixes from Miquel Raynal:
       "Raw NAND:
         - GPMI: Fix the suspend/resume
      
        SPI-NOR:
         - Fix quad enable on Spansion like flashes
         - Fix selection of 4-byte addressing opcodes on Spansion"
      
      * tag 'mtd/fixes-for-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: gpmi: Restore nfc timing setup after suspend/resume
        mtd: rawnand: gpmi: Fix suspend/resume problem
        mtd: spi-nor: Fix quad enable for Spansion like flashes
        mtd: spi-nor: Fix selection of 4-byte addressing opcodes on Spansion
      8f8972a3
  8. 18 Jan, 2020 22 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm · 244dc268
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Back from LCA2020, fixes wasn't too busy last week, seems to have
        quieten down appropriately, some amdgpu, i915, then a core mst fix and
        one fix for virtio-gpu and one for rockchip:
      
        core mst:
         - serialize down messages and clear timeslots are on unplug
      
        amdgpu:
         - Update golden settings for renoir
         - eDP fix
      
        i915:
         - uAPI fix: Remove dash and colon from PMU names to comply with
           tools/perf
         - Fix for include file that was indirectly included
         - Two fixes to make sure VMA are marked active for error capture
      
        virtio:
         - maintain obj reservation lock when submitting cmds
      
        rockchip:
         - increase link rate var size to accommodate rates"
      
      * tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/display: Reorder detect_edp_sink_caps before link settings read.
        drm/amdgpu: update goldensetting for renoir
        drm/dp_mst: Have DP_Tx send one msg at a time
        drm/dp_mst: clear time slots for ports invalid
        drm/i915/pmu: Do not use colons or dashes in PMU names
        drm/rockchip: fix integer type used for storing dp data rate
        drm/i915/gt: Mark ring->vma as active while pinned
        drm/i915/gt: Mark context->state vma as active while pinned
        drm/i915/gt: Skip trying to unbind in restore_ggtt_mappings
        drm/i915: Add missing include file <linux/math64.h>
        drm/virtio: add missing virtio_gpu_array_lock_resv call
      244dc268
    • Ilie Halip's avatar
      riscv: delete temporary files · 95f4d9cc
      Ilie Halip authored
      Temporary files used in the VDSO build process linger on even after make
      mrproper: vdso-dummy.o.tmp, vdso.so.dbg.tmp.
      
      Delete them once they're no longer needed.
      Signed-off-by: default avatarIlie Halip <ilie.halip@gmail.com>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      95f4d9cc
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0cc2682d
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - a resctrl fix for uninitialized objects found by debugobjects
      
         - a resctrl memory leak fix
      
         - fix the unintended re-enabling of the of SME and SEV CPU flags if
           memory encryption was disabled at bootup via the MSR space"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
        x86/resctrl: Fix potential memory leak
        x86/resctrl: Fix an imbalance in domain_remove_cpu()
      0cc2682d
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7ff15cd0
      Linus Torvalds authored
      Pull timer fixes from Ingo Molnar:
       "Three fixes: fix link failure on Alpha, fix a Sparse warning and
        annotate/robustify a lockless access in the NOHZ code"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/sched: Annotate lockless access to last_jiffies_update
        lib/vdso: Make __cvdso_clock_getres() static
        time/posix-stubs: Provide compat itimer supoprt for alpha
      7ff15cd0
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e79c523
      Linus Torvalds authored
      Pull cpu/SMT fix from Ingo Molnar:
       "Fix a build bug on CONFIG_HOTPLUG_SMT=y && !CONFIG_SYSFS kernels"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/SMT: Fix x86 link error without CONFIG_SYSFS
      9e79c523
    • Linus Torvalds's avatar
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a186c112
      Linus Torvalds authored
      Pull x86 RAS fix from Ingo Molnar:
       "Fix a thermal throttling race that can result in easy to trigger boot
        crashes on certain Ice Lake platforms"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce/therm_throt: Do not access uninitialized therm_work
      a186c112
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b07b9e8d
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, three Intel uncore driver fixes, plus an AUX events fix
        uncovered by the perf fuzzer"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Remove PCIe3 unit for SNR
        perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
        perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 Family
        perf: Correctly handle failed perf_get_aux_event()
        perf hists: Fix variable name's inconsistency in hists__for_each() macro
        perf map: Set kmap->kmaps backpointer for main kernel map chunks
        perf report: Fix incorrectly added dimensions as switch perf data file
        tools lib traceevent: Fix memory leakage in filter_event
      b07b9e8d
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 124b5547
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Three fixes:
      
          - Fix an rwsem spin-on-owner crash, introduced in v5.4
      
          - Fix a lockdep bug when running out of stack_trace entries,
            introduced in v5.4
      
          - Docbook fix"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
        futex: Fix kernel-doc notation warning
        locking/lockdep: Fix buffer overrun problem in stack_trace[]
      124b5547
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a1c6f87e
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "Fix a recent regression in the Ingenic SoCs irqchip driver that floods
        the syslog"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/ingenic: Get rid of the legacy IRQ domain
      a1c6f87e
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e2f73d1e
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Three EFI fixes:
      
         - Fix a slow-boot-scrolling regression but making sure we use WC for
           EFI earlycon framebuffer mappings on x86
      
         - Fix a mixed EFI mode boot crash
      
         - Disable paging explicitly before entering startup_32() in mixed
           mode bootup"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efistub: Disable paging at mixed mode entry
        efi/libstub/random: Initialize pointer variables to zero for mixed mode
        efi/earlycon: Fix write-combine mapping on x86
      e2f73d1e
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba0f4722
      Linus Torvalds authored
      Pull rseq fixes from Ingo Molnar:
       "Two rseq bugfixes:
      
         - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end
           up corrupting the TLS of the parent. Technically a change in the
           ABI but the previous behavior couldn't resonably have been relied
           on by applications so this looks like a valid exception to the ABI
           rule.
      
         - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the
           handling of other flags. This is not thought to impact any
           applications either"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rseq: Unregister rseq for clone CLONE_VM
        rseq: Reject unknown flags on rseq unregister
      ba0f4722
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 8cac8990
      Linus Torvalds authored
      Pull thread fixes from Christian Brauner:
       "Here is an urgent fix for ptrace_may_access() permission checking.
      
        Commit 69f594a3 ("ptrace: do not audit capability check when
        outputing /proc/pid/stat") introduced the ability to opt out of audit
        messages for accesses to various proc files since they are not
        violations of policy.
      
        While doing so it switched the check from ns_capable() to
        has_ns_capability{_noaudit}(). That means it switched from checking
        the subjective credentials (ktask->cred) of the task to using the
        objective credentials (ktask->real_cred). This is appears to be wrong.
        ptrace_has_cap() is currently only used in ptrace_may_access() And is
        used to check whether the calling task (subject) has the
        CAP_SYS_PTRACE capability in the provided user namespace to operate on
        the target task (object). According to the cred.h comments this means
        the subjective credentials of the calling task need to be used.
      
        With this fix we switch ptrace_has_cap() to use security_capable() and
        thus back to using the subjective credentials.
      
        As one example where this might be particularly problematic, Jann
        pointed out that in combination with the upcoming IORING_OP_OPENAT{2}
        feature, this bug might allow unprivileged users to bypass the
        capability checks while asynchronously opening files like /proc/*/mem,
        because the capability checks for this would be performed against
        kernel credentials.
      
        To illustrate on the former point about this being exploitable: When
        io_uring creates a new context it records the subjective credentials
        of the caller. Later on, when it starts to do work it creates a kernel
        thread and registers a callback. The callback runs with kernel creds
        for ktask->real_cred and ktask->cred.
      
        To prevent this from becoming a full-blown 0-day io_uring will call
        override_cred() and override ktask->cred with the subjective
        credentials of the creator of the io_uring instance. With
        ptrace_has_cap() currently looking at ktask->real_cred this override
        will be ineffective and the caller will be able to open arbitray proc
        files as mentioned above.
      
        Luckily, this is currently not exploitable but would be so once
        IORING_OP_OPENAT{2} land in v5.6. Let's fix it now.
      
        To minimize potential regressions I successfully ran the criu
        testsuite. criu makes heavy use of ptrace() and extensively hits
        ptrace_may_access() codepaths and has a good change of detecting any
        regressions.
      
        Additionally, I succesfully ran the ptrace and seccomp kernel tests"
      
      * tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
      8cac8990
    • Linus Torvalds's avatar
      Merge tag 's390-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 2324de6f
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix printing misleading Secure-IPL enabled message when it is not.
      
       - Fix a race condition between host ap bus and guest ap bus doing
         device reset in crypto code.
      
       - Fix sanity check in CCA cipher key function (CCA AES cipher key
         support), which fails otherwise.
      
      * tag 's390-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/setup: Fix secure ipl message
        s390/zcrypt: move ap device reset from bus to driver code
        s390/zcrypt: Fix CCA cipher key gen with clear key value function
      2324de6f
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8965de70
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three fixes in drivers with no impact to core code.
      
        The mptfusion fix is enormous because the driver API had to be
        rethreaded to pass down the necessary iocp pointer, but once that's
        done a significant chunk of code is deleted.
      
        The other two patches are small"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: mptfusion: Fix double fetch bug in ioctl
        scsi: storvsc: Correctly set number of hardware queues for IDE disk
        scsi: fnic: fix invalid stack access
      8965de70
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f04dba64
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Here are some small fixes for 5.5-rc7
      
        Included here are:
      
         -  two lkdtm fixes
      
         -  coresight build fix
      
         -  Documentation update for the hw process document
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Documentation/process: Add Amazon contact for embargoed hardware issues
        lkdtm/bugs: fix build error in lkdtm_UNSET_SMEP
        lkdtm/bugs: Make double-fault test always available
        coresight: etm4x: Fix unused function warning
      f04dba64
    • Linus Torvalds's avatar
      Merge tag 'staging-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · bf3f401d
      Linus Torvalds authored
      Pull staging and IIO driver fixes from Greg KH:
       "Here are some small staging and iio driver fixes for 5.5-rc7
      
        All of them are for some small reported issues. Nothing major, full
        details in the shortlog.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: comedi: ni_routes: allow partial routing information
        staging: comedi: ni_routes: fix null dereference in ni_find_route_source()
        iio: light: vcnl4000: Fix scale for vcnl4040
        iio: buffer: align the size of scan bytes to size of the largest element
        iio: chemical: pms7003: fix unmet triggered buffer dependency
        iio: imu: st_lsm6dsx: Fix selection of ST_LSM6DS3_ID
        iio: adc: ad7124: Fix DT channel configuration
      bf3f401d
    • Linus Torvalds's avatar
      Merge tag 'usb-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c5fd2c5b
      Linus Torvalds authored
      Pull USB driver fixes from Greg KH:
       "Here are some small USB driver and core fixes for 5.5-rc7
      
        There's one fix for hub wakeup issues and a number of small usb-serial
        driver fixes and device id updates.
      
        The hub fix has been in linux-next for a while with no reported
        issues, and the usb-serial ones have all passed 0-day with no
        problems"
      
      * tag 'usb-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: quatech2: handle unbound ports
        USB: serial: keyspan: handle unbound ports
        USB: serial: io_edgeport: add missing active-port sanity check
        USB: serial: io_edgeport: handle unbound ports on URB completion
        USB: serial: ch341: handle unbound port at reset_resume
        USB: serial: suppress driver bind attributes
        USB: serial: option: add support for Quectel RM500Q in QDL mode
        usb: core: hub: Improved device recognition on remote wakeup
        USB: serial: opticon: fix control-message timeouts
        USB: serial: option: Add support for Quectel RM500Q
        USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
      c5fd2c5b
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · e02d9c4c
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      3 small bug fix patches.  The 1st two are aRFS fixes and the last one
      fixes a fatal driver load failure on some kernels without PCIe
      extended config space support enabled.
      
      Please also queue these for -stable.  Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e02d9c4c
    • Michael Chan's avatar
      bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal. · d061b241
      Michael Chan authored
      DSN read can fail, for example on a kdump kernel without PCIe extended
      config space support.  If DSN read fails, don't set the
      BNXT_FLAG_DSN_VALID flag and continue loading.  Check the flag
      to see if the stored DSN is valid before using it.  Only VF reps
      creation should fail without valid DSN.
      
      Fixes: 03213a99 ("bnxt: move bp->switch_id initialization to PF probe")
      Reported-by: default avatarMarc Smith <msmith626@gmail.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d061b241
    • Michael Chan's avatar
      bnxt_en: Fix ipv6 RFS filter matching logic. · 6fc7caa8
      Michael Chan authored
      Fix bnxt_fltr_match() to match ipv6 source and destination addresses.
      The function currently only checks ipv4 addresses and will not work
      corrently on ipv6 filters.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fc7caa8
    • Michael Chan's avatar
      bnxt_en: Fix NTUPLE firmware command failures. · ceb3284c
      Michael Chan authored
      The NTUPLE related firmware commands are sent to the wrong firmware
      channel, causing all these commands to fail on new firmware that
      supports the new firmware channel.  Fix it by excluding the 3
      NTUPLE firmware commands from the list for the new firmware channel.
      
      Fixes: 760b6d33 ("bnxt_en: Add support for 2nd firmware message channel.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb3284c
    • Christian Brauner's avatar
      ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() · 6b3ad664
      Christian Brauner authored
      Commit 69f594a3 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
      introduced the ability to opt out of audit messages for accesses to various
      proc files since they are not violations of policy.  While doing so it
      somehow switched the check from ns_capable() to
      has_ns_capability{_noaudit}(). That means it switched from checking the
      subjective credentials of the task to using the objective credentials. This
      is wrong since. ptrace_has_cap() is currently only used in
      ptrace_may_access() And is used to check whether the calling task (subject)
      has the CAP_SYS_PTRACE capability in the provided user namespace to operate
      on the target task (object). According to the cred.h comments this would
      mean the subjective credentials of the calling task need to be used.
      This switches ptrace_has_cap() to use security_capable(). Because we only
      call ptrace_has_cap() in ptrace_may_access() and in there we already have a
      stable reference to the calling task's creds under rcu_read_lock() there's
      no need to go through another series of dereferences and rcu locking done
      in ns_capable{_noaudit}().
      
      As one example where this might be particularly problematic, Jann pointed
      out that in combination with the upcoming IORING_OP_OPENAT feature, this
      bug might allow unprivileged users to bypass the capability checks while
      asynchronously opening files like /proc/*/mem, because the capability
      checks for this would be performed against kernel credentials.
      
      To illustrate on the former point about this being exploitable: When
      io_uring creates a new context it records the subjective credentials of the
      caller. Later on, when it starts to do work it creates a kernel thread and
      registers a callback. The callback runs with kernel creds for
      ktask->real_cred and ktask->cred. To prevent this from becoming a
      full-blown 0-day io_uring will call override_cred() and override
      ktask->cred with the subjective credentials of the creator of the io_uring
      instance. With ptrace_has_cap() currently looking at ktask->real_cred this
      override will be ineffective and the caller will be able to open arbitray
      proc files as mentioned above.
      Luckily, this is currently not exploitable but will turn into a 0-day once
      IORING_OP_OPENAT{2} land in v5.6. Fix it now!
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Fixes: 69f594a3 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      6b3ad664