1. 29 Jun, 2017 25 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 3ff8eda2
      Paul Mackerras authored
      commit 7ceaa6dc upstream.
      
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ff8eda2
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 324df574
      Paul Mackerras authored
      commit 4c3bb4cc upstream.
      
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324df574
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · 4ae3f358
      Paul Mackerras authored
      commit ca8efa1d upstream.
      
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ae3f358
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · a7d29e27
      Paul Mackerras authored
      commit 3d3efb68 upstream.
      
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d29e27
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 4d9c6828
      Paul Mackerras authored
      commit 46a704f8 upstream.
      
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d9c6828
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Cope with host using large decrementer mode · 1e0d7996
      Paul Mackerras authored
      commit 2f272463 upstream.
      
      POWER9 introduces a new mode for the decrementer register, called
      large decrementer mode, in which the decrementer counter is 56 bits
      wide rather than 32, and reads are sign-extended rather than
      zero-extended.  For the decrementer, this new mode is optional and
      controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
      is 56 bits wide on POWER9 and has no mode control.
      
      Since KVM code reads and writes the decrementer and hypervisor
      decrementer registers in a few places, it needs to be aware of the
      need to treat the decrementer value as a 64-bit quantity, and only do
      a 32-bit sign extension when large decrementer mode is not in effect.
      Similarly, the HDEC should always be treated as a 64-bit quantity on
      POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
      test for POWER9 and the sign extension.
      
      To enable the sign extension to be removed in large decrementer mode,
      we test the LPCR_LD bit in the host LPCR image stored in the struct
      kvm for the guest.  If is set then large decrementer mode is enabled
      and the sign extension should be skipped.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e0d7996
    • Heiko Carstens's avatar
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · 5d34bddd
      Heiko Carstens authored
      commit addb63c1 upstream.
      
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d34bddd
    • James Cowgill's avatar
      KVM: MIPS: Fix maybe-uninitialized build failure · 87ee4cdd
      James Cowgill authored
      commit e27a9eca upstream.
      
      This commit fixes a "maybe-uninitialized" build failure in
      arch/mips/kvm/tlb.c when KVM, DYNAMIC_DEBUG and JUMP_LABEL are all
      enabled. The failure is:
      
      In file included from ./include/linux/printk.h:329:0,
                       from ./include/linux/kernel.h:13,
                       from ./include/asm-generic/bug.h:15,
                       from ./arch/mips/include/asm/bug.h:41,
                       from ./include/linux/bug.h:4,
                       from ./include/linux/thread_info.h:11,
                       from ./include/asm-generic/current.h:4,
                       from ./arch/mips/include/generated/asm/current.h:1,
                       from ./include/linux/sched.h:11,
                       from arch/mips/kvm/tlb.c:13:
      arch/mips/kvm/tlb.c: In function ‘kvm_mips_host_tlb_inv’:
      ./include/linux/dynamic_debug.h:126:3: error: ‘idx_kernel’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         __dynamic_pr_debug(&descriptor, pr_fmt(fmt), \
         ^~~~~~~~~~~~~~~~~~
      arch/mips/kvm/tlb.c:169:16: note: ‘idx_kernel’ was declared here
        int idx_user, idx_kernel;
                      ^~~~~~~~~~
      
      There is a similar error relating to "idx_user". Both errors were
      observed with GCC 6.
      
      As far as I can tell, it is impossible for either idx_user or idx_kernel
      to be uninitialized when they are later read in the calls to kvm_debug,
      but to satisfy the compiler, add zero initializers to both variables.
      Signed-off-by: default avatarJames Cowgill <James.Cowgill@imgtec.com>
      Fixes: 57e3869c ("KVM: MIPS/TLB: Generalise host TLB invalidate to kernel ASID")
      Acked-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87ee4cdd
    • Paolo Bonzini's avatar
      KVM: x86: fix singlestepping over syscall · 3af2b32a
      Paolo Bonzini authored
      commit c8401dda upstream.
      
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3af2b32a
    • Björn Töpel's avatar
      perf probe: Fix probe definition for inlined functions · 75d73538
      Björn Töpel authored
      commit 7598f8bc upstream.
      
      In commit 613f050d ("perf probe: Fix to probe on gcc generated
      functions in modules"), the offset from symbol is, incorrectly, added
      to the trace point address. This leads to incorrect probe trace points
      for inlined functions and when using relative line number on symbols.
      
      Prior this patch:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
      
      After:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
      
      Committer testing:
      
      Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
      the functions that while not being explicitely marked as inline, were inlined
      by the compiler:
      
        # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
        __ew32
        e1000_regdump
        e1000e_dump_ps_pages
        e1000_desc_unused
        e1000e_systim_to_hwtstamp
        e1000e_rx_hwtstamp
        e1000e_update_rdt_wa
        e1000e_update_tdt_wa
        e1000_put_txbuf
        e1000_consume_page
      
      Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
      them:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
      e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
      that function:
      
        $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
        <SNIP>
        <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
          <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
          <13e287>   DW_AT_low_pc      : 0x17a30
        <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
          <13e6f4>   DW_AT_low_pc      : 0x17be6
        <SNIP>
        <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
           <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page
      
      So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
      inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
      address, gives us the offset we should use in the probe definition:
      
        0x17be6 - 0x17a30 = 438
      
      but above we have 876, which is twice as much.
      
      Lets see the second inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
        <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
          <13e873>   DW_AT_low_pc      : 0x17d21
      
        0x17d21 - 0x17a30 = 753
      
      So we where adding it at twice the offset from the containing function as we
      should.
      
      And then after this patch:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
        #
      
      Which matches the two first expansions and shows that because we were
      doubling the offset it would spill over the next function:
      
        readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
         673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
         674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps
      
      This is the 3rd inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
         <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
          <13ec7c>   DW_AT_low_pc      : 0x17f79
      
        0x17f79 - 0x17a30 = 1353
      
       So:
      
         0x17a30 + 2 * 1353 = 0x184c2
      
        And:
      
         0x184c2 - 0x18090 = 1074
      
      Which explains the bogus third expansion for e1000_consume_page() to end up at:
      
         p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      All fixed now :-)
      
      [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Fixes: 613f050d ("perf probe: Fix to probe on gcc generated functions in modules")
      Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75d73538
    • Kan Liang's avatar
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · d46eda19
      Kan Liang authored
      commit fb3a5055 upstream.
      
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: default avatarKan Liang <Kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d46eda19
    • Ilya Matveychikov's avatar
      lib/cmdline.c: fix get_options() overflow while parsing ranges · 1a640581
      Ilya Matveychikov authored
      commit a91e0f68 upstream.
      
      When using get_options() it's possible to specify a range of numbers,
      like 1-100500.  The problem is that it doesn't track array size while
      calling internally to get_range() which iterates over the range and
      fills the memory with numbers.
      
      Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.comSigned-off-by: default avatarIlya V. Matveychikov <matvejchikov@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a640581
    • Jan Kara's avatar
      fs/dax.c: fix inefficiency in dax_writeback_mapping_range() · 5f83a744
      Jan Kara authored
      commit 1eb643d0 upstream.
      
      dax_writeback_mapping_range() fails to update iteration index when
      searching radix tree for entries needing cache flushing.  Thus each
      pagevec worth of entries is searched starting from the start which is
      inefficient and prone to livelocks.  Update index properly.
      
      Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
      Fixes: 9973c98e ("dax: add support for fsync/sync")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f83a744
    • NeilBrown's avatar
      autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · 909c2562
      NeilBrown authored
      commit 9fa4eb8e upstream.
      
      If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
      autofs4_d_automount() will return
      
         ERR_PTR(status)
      
      with that status to follow_automount(), which will then dereference an
      invalid pointer.
      
      So treat a positive status the same as zero, and map to ENOENT.
      
      See comment in systemd src/core/automount.c::automount_send_ready().
      
      Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      909c2562
    • Ravi Bangoria's avatar
      powerpc/perf: Fix oops when kthread execs user process · a2063a20
      Ravi Bangoria authored
      commit bf05fc25 upstream.
      
      When a kthread calls call_usermodehelper() the steps are:
        1. allocate current->mm
        2. load_elf_binary()
        3. populate current->thread.regs
      
      While doing this, interrupts are not disabled. If there is a perf
      interrupt in the middle of this process (i.e. step 1 has completed
      but not yet reached to step 3) and if perf tries to read userspace
      regs, kernel oops with following log:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000da0fc
        ...
        Call Trace:
        perf_output_sample_regs+0x6c/0xd0
        perf_output_sample+0x4e4/0x830
        perf_event_output_forward+0x64/0x90
        __perf_event_overflow+0x8c/0x1e0
        record_and_restart+0x220/0x5c0
        perf_event_interrupt+0x2d8/0x4d0
        performance_monitor_exception+0x54/0x70
        performance_monitor_common+0x158/0x160
        --- interrupt: f01 at avtab_search_node+0x150/0x1a0
            LR = avtab_search_node+0x100/0x1a0
        ...
        load_elf_binary+0x6e8/0x15a0
        search_binary_handler+0xe8/0x290
        do_execveat_common.isra.14+0x5f4/0x840
        call_usermodehelper_exec_async+0x170/0x210
        ret_from_kernel_thread+0x5c/0x7c
      
      Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
      pt_regs are not set.
      
      Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2063a20
    • Kees Cook's avatar
      fs/exec.c: account for argv/envp pointers · fed07e89
      Kees Cook authored
      commit 98da7d08 upstream.
      
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fed07e89
    • Takashi Iwai's avatar
      ALSA: hda - Apply quirks to Broxton-T, too · c8bfdd08
      Takashi Iwai authored
      commit c7ecb906 upstream.
      
      Broxton-T was a forgotten child and we didn't apply the quirks for
      Skylake+ properly.  Meanwhile, a quirk for reducing the DMA latency
      seems specific to the early Broxton model, so we leave as is.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8bfdd08
    • Megha Dey's avatar
      ALSA: hda - Add Coffelake PCI ID · 14487f9f
      Megha Dey authored
      commit e79b0006 upstream.
      
      Coffelake is another Intel part, so need to add PCI ID for it.
      Signed-off-by: default avatarMegha Dey <megha.dey@intel.com>
      Signed-off-by: default avatarSubhransu S. Prusty <subhransu.s.prusty@intel.com>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14487f9f
    • Takashi Iwai's avatar
      ALSA: pcm: Don't treat NULL chmap as a fatal error · e6dc243c
      Takashi Iwai authored
      commit 2deaeaf1 upstream.
      
      The standard PCM chmap helper callbacks treat the NULL info->chmap as
      a fatal error and spews the kernel warning with stack trace when
      CONFIG_SND_DEBUG is on.  This was OK, originally it was supposed to be
      always static and non-NULL.  But, as the recent addition of Intel LPE
      audio driver shows, the chmap content may vary dynamically, and it can
      be even NULL when disconnected.  The user still sees the kernel
      warning unnecessarily.
      
      For clearing such a confusion, this patch simply removes the
      snd_BUG_ON() in each place, just returns an error without warning.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6dc243c
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib: Fix stall of process context at packet error · c06718bc
      Takashi Sakamoto authored
      commit 4a9bfafc upstream.
      
      At Linux v3.5, packet processing can be done in process context of ALSA
      PCM application as well as software IRQ context for OHCI 1394. Below is
      an example of the callgraph (some calls are omitted).
      
      ioctl(2) with e.g. HWSYNC
      (sound/core/pcm_native.c)
      ->snd_pcm_common_ioctl1()
        ->snd_pcm_hwsync()
          ->snd_pcm_stream_lock_irq
          (sound/core/pcm_lib.c)
          ->snd_pcm_update_hw_ptr()
            ->snd_pcm_udpate_hw_ptr0()
              ->struct snd_pcm_ops.pointer()
              (sound/firewire/*)
              = Each handler on drivers in ALSA firewire stack
                (sound/firewire/amdtp-stream.c)
                ->amdtp_stream_pcm_pointer()
                  (drivers/firewire/core-iso.c)
                  ->fw_iso_context_flush_completions()
                    ->struct fw_card_driver.flush_iso_completion()
                    (drivers/firewire/ohci.c)
                    = flush_iso_completions()
                      ->struct fw_iso_context.callback.sc
                      (sound/firewire/amdtp-stream.c)
                      = in_stream_callback() or out_stream_callback()
                        ->...
          ->snd_pcm_stream_unlock_irq
      
      When packet queueing error occurs or detecting invalid packets in
      'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
      is called on local CPU with disabled IRQ.
      
      (sound/firewire/amdtp-stream.c)
      in_stream_callback() or out_stream_callback()
      ->amdtp_stream_pcm_abort()
        ->snd_pcm_stop_xrun()
          ->snd_pcm_stream_lock_irqsave()
          ->snd_pcm_stop()
          ->snd_pcm_stream_unlock_irqrestore()
      
      The process is stalled on the CPU due to attempt to acquire recursive lock.
      
      [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
      [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0 softirq=38323/38323 fqs=7140
      [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035, q=5933)
      [  562.630866] Task dump for CPU 2:
      [  562.630867] alsa-source-OXF R  running task        0  6619      1 0x00000008
      [  562.630870] Call Trace:
      [  562.630876]  ? vt_console_print+0x79/0x3e0
      [  562.630880]  ? msg_print_text+0x9d/0x100
      [  562.630883]  ? up+0x32/0x50
      [  562.630885]  ? irq_work_queue+0x8d/0xa0
      [  562.630886]  ? console_unlock+0x2b6/0x4b0
      [  562.630888]  ? vprintk_emit+0x312/0x4a0
      [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
      [  562.630895]  ? do_sys_poll+0x37a/0x550
      [  562.630897]  ? dev_printk_emit+0x4e/0x70
      [  562.630900]  ? __dev_printk+0x3c/0x80
      [  562.630903]  ? _raw_spin_lock+0x20/0x30
      [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
      [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
      [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
      [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
      [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
      [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
      [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
      [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
      [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
      [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
      [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
      [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
      [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
      [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
      [  562.630962]  ? futex_wake+0x90/0x170
      [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
      [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
      [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
      [  562.630974]  ? vfs_read+0x11b/0x130
      [  562.630976]  ? SyS_ioctl+0x79/0x90
      [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      
      This commit fixes the above bug. This assumes two cases:
      1. Any error is detected in software IRQ context of OHCI 1394 context.
      In this case, PCM substream should be aborted in packet handler. On the
      other hand, it should not be done in any process context. TO distinguish
      these two context, use 'in_interrupt()' macro.
      2. Any error is detect in process context of ALSA PCM application.
      In this case, PCM substream should not be aborted in packet handler
      because PCM substream lock is acquired. The task to abort PCM substream
      should be done in ALSA PCM core. For this purpose, SNDRV_PCM_POS_XRUN is
      returned at 'struct snd_pcm_ops.pointer()'.
      Suggested-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Fixes: e9148ddd("ALSA: firewire-lib: flush completed packets when reading PCM position")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c06718bc
    • Jan Beulich's avatar
      xen-blkback: don't leak stack data via response ring · b919d2dc
      Jan Beulich authored
      commit 089bc014 upstream.
      
      Rather than constructing a local structure instance on the stack, fill
      the fields directly on the shared ring, just like other backends do.
      Build on the fact that all response structure flavors are actually
      identical (the old code did make this assumption too).
      
      This is XSA-216.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b919d2dc
    • Juergen Gross's avatar
      xen/blkback: fix disconnect while I/Os in flight · 7bffd6bb
      Juergen Gross authored
      commit 46464411 upstream.
      
      Today disconnecting xen-blkback is broken in case there are still
      I/Os in flight: xen_blkif_disconnect() will bail out early without
      releasing all resources in the hope it will be called again when
      the last request has terminated. This, however, won't happen as
      xen_blkif_free() won't be called on termination of the last running
      request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
      xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
      calls in xen_blkif_disconnect() didn't happen.
      
      To solve this deadlock xen_blkif_disconnect() and
      xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
      xen_blkif_get() but use some other way to do their accounting of
      resources.
      
      This at once fixes another error in xen_blkif_disconnect(): when it
      returned early with -EBUSY for another ring than 0 it would call
      xen_blkif_put() again for already handled rings on a subsequent call.
      This will lead to inconsistencies in the refcnt handling.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarSteven Haigh <netwiz@crc.id.au>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bffd6bb
    • Boris Brezillon's avatar
      clk: sunxi-ng: sun5i: Fix ahb_bist_clk definition · 61ab4c4a
      Boris Brezillon authored
      commit 370d9192 upstream.
      
      AHB BIST gate is actually controlled with bit 7.
      
      This bug was detected while trying to use the NAND controller which is
      using the DMA engine to transfer data to the NAND.
      Since the ahb_bist_clk gate bit conflicts with the ahb_dma_clk gate bit,
      the core was disabling the DMA engine clock as part of its 'disable
      unused clks' procedure, which was causing all DMA transfers to fail after
      this point.
      
      Fixes: 5e737617 ("clk: sunxi-ng: Add sun5i CCU driver")
      Reported-by: default avatarAngus Ainslie <angus@akkea.ca>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Tested-by: default avatarAngus Ainslie <angus@akkea.ca>
      Reviewed-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMichael Turquette <mturquette@baylibre.com>
      Link: lkml.kernel.org/r/1495643669-28221-1-git-send-email-boris.brezillon@free-electrons.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61ab4c4a
    • Yong Deng's avatar
      clk: sunxi-ng: v3s: Fix usb otg device reset bit · 744b0f61
      Yong Deng authored
      commit 7ffc781e upstream.
      
      V3S's usb otg device reset bit should be 24, not 23.
      Signed-off-by: default avatarYong Deng <iemdey@gmail.com>
      Reviewed-By: default avatarIcenowy Zheng <icenowy@aosc.io>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      744b0f61
    • Chen-Yu Tsai's avatar
      clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset · 861b4d95
      Chen-Yu Tsai authored
      commit 38b8f823 upstream.
      
      The register offset for the lcd1-ch1 clock was incorrectly pointing to
      the lcd0-ch1 clock. This resulted in the lcd0-ch1 clock being disabled
      when the clk core disables unused clocks. This then stops the simplefb
      HDMI output path.
      Reported-by: default avatarBob Ham <rah@settrans.net>
      Fixes: c6e6c96d ("clk: sunxi-ng: Add A31/A31s clocks")
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      861b4d95
  2. 24 Jun, 2017 15 commits