1. 25 May, 2017 40 commits
    • Matthias Lange's avatar
      xhci: remove GFP_DMA flag from allocation · 374a3fb5
      Matthias Lange authored
      commit 5db851cf upstream.
      
      There is no reason to restrict allocations to the first 16MB ISA DMA
      addresses.
      
      It is causing problems in a virtualization setup with enabled IOMMU
      (x86_64). The result is that USB is not working in the VM.
      Signed-off-by: default avatarMatthias Lange <matthias.lange@kernkonzept.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      374a3fb5
    • Toshi Kani's avatar
      libnvdimm: fix clear length of nvdimm_forget_poison() · fa313fd6
      Toshi Kani authored
      commit 8d13c029 upstream.
      
      ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
      of error actually cleared, which may be smaller than its requested
      'len'.
      
      Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
      'clear_err.cleared' when this value is valid.
      
      Fixes: e046114a ("libnvdimm: clear the internal poison_list when clearing badblocks")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa313fd6
    • Eric Biggers's avatar
      fscrypt: avoid collisions when presenting long encrypted filenames · af9bd521
      Eric Biggers authored
      commit 6b06cdee upstream.
      
      When accessing an encrypted directory without the key, userspace must
      operate on filenames derived from the ciphertext names, which contain
      arbitrary bytes.  Since we must support filenames as long as NAME_MAX,
      we can't always just base64-encode the ciphertext, since that may make
      it too long.  Currently, this is solved by presenting long names in an
      abbreviated form containing any needed filesystem-specific hashes (e.g.
      to identify a directory block), then the last 16 bytes of ciphertext.
      This needs to be sufficient to identify the actual name on lookup.
      
      However, there is a bug.  It seems to have been assumed that due to the
      use of a CBC (ciphertext block chaining)-based encryption mode, the last
      16 bytes (i.e. the AES block size) of ciphertext would depend on the
      full plaintext, preventing collisions.  However, we actually use CBC
      with ciphertext stealing (CTS), which handles the last two blocks
      specially, causing them to appear "flipped".  Thus, it's actually the
      second-to-last block which depends on the full plaintext.
      
      This caused long filenames that differ only near the end of their
      plaintexts to, when observed without the key, point to the wrong inode
      and be undeletable.  For example, with ext4:
      
          # echo pass | e4crypt add_key -p 16 edir/
          # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
          # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
          100000
          # sync
          # echo 3 > /proc/sys/vm/drop_caches
          # keyctl new_session
          # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
          2004
          # rm -rf edir/
          rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
          ...
      
      To fix this, when presenting long encrypted filenames, encode the
      second-to-last block of ciphertext rather than the last 16 bytes.
      
      Although it would be nice to solve this without depending on a specific
      encryption mode, that would mean doing a cryptographic hash like SHA-256
      which would be much less efficient.  This way is sufficient for now, and
      it's still compatible with encryption modes like HEH which are strong
      pseudorandom permutations.  Also, changing the presented names is still
      allowed at any time because they are only provided to allow applications
      to do things like delete encrypted directories.  They're not designed to
      be used to persistently identify files --- which would be hard to do
      anyway, given that they're encrypted after all.
      
      For ease of backports, this patch only makes the minimal fix to both
      ext4 and f2fs.  It leaves ubifs as-is, since ubifs doesn't compare the
      ciphertext block yet.  Follow-on patches will clean things up properly
      and make the filesystems use a shared helper function.
      
      Fixes: 5de0b4d0 ("ext4 crypto: simplify and speed up filename encryption")
      Reported-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af9bd521
    • Jaegeuk Kim's avatar
      f2fs: check entire encrypted bigname when finding a dentry · 8daed21d
      Jaegeuk Kim authored
      commit 6332cd32 upstream.
      
      If user has no key under an encrypted dir, fscrypt gives digested dentries.
      Previously, when looking up a dentry, f2fs only checks its hash value with
      first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
      This patch enhances to check entire dentry bytes likewise ext4.
      
      Eric reported how to reproduce this issue by:
      
       # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
       # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
      100000
       # sync
       # echo 3 > /proc/sys/vm/drop_caches
       # keyctl new_session
       # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
      99999
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      (fixed f2fs_dentry_hash() to work even when the hash is 0)
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8daed21d
    • Johan Hovold's avatar
      USB: chaoskey: fix Alea quirk on big-endian hosts · b9c0da62
      Johan Hovold authored
      commit 63afd5cc upstream.
      
      Add missing endianness conversion when applying the Alea timeout quirk.
      
      Found using sparse:
      
      	warning: restricted __le16 degrades to integer
      
      Fixes: e4a886e8 ("hwrng: chaoskey - Fix URB warning due to timeout on Alea")
      Cc: Bob Ham <bob.ham@collabora.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9c0da62
    • Andrey Korolyov's avatar
      USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs · 545a3171
      Andrey Korolyov authored
      commit 5f63424a upstream.
      
      This patch adds support for recognition of ARM-USB-TINY(H) devices which
      are almost identical to ARM-USB-OCD(H) but lacking separate barrel jack
      and serial console.
      
      By suggestion from Johan Hovold it is possible to replace
      ftdi_jtag_quirk with a bit more generic construction. Since all
      Olimex-ARM debuggers has exactly two ports, we could safely always use
      only second port within the debugger family.
      Signed-off-by: default avatarAndrey Korolyov <andrey@xdel.ru>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      545a3171
    • Anthony Mallet's avatar
      USB: serial: ftdi_sio: fix setting latency for unprivileged users · 038ccaa5
      Anthony Mallet authored
      commit bb246681 upstream.
      
      Commit 557aaa7f ("ft232: support the ASYNC_LOW_LATENCY
      flag") enables unprivileged users to set the FTDI latency timer,
      but there was a logic flaw that skipped sending the corresponding
      USB control message to the device.
      
      Specifically, the device latency timer would not be updated until next
      open, something which was later also inadvertently broken by commit
      c19db4c9 ("USB: ftdi_sio: set device latency timeout at port
      probe").
      
      A recent commit c6dce262 ("USB: serial: ftdi_sio: fix extreme
      low-latency setting") disabled the low-latency mode by default so we now
      need this fix to allow unprivileged users to again enable it.
      Signed-off-by: default avatarAnthony Mallet <anthony.mallet@laas.fr>
      [johan: amend commit message]
      Fixes: 557aaa7f ("ft232: support the ASYNC_LOW_LATENCY flag")
      Fixes: c19db4c9 ("USB: ftdi_sio: set device latency timeout at port probe").
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      038ccaa5
    • Kirill Tkhai's avatar
      pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() · 2ea2f891
      Kirill Tkhai authored
      commit 3fd37226 upstream.
      
      Imagine we have a pid namespace and a task from its parent's pid_ns,
      which made setns() to the pid namespace. The task is doing fork(),
      while the pid namespace's child reaper is dying. We have the race
      between them:
      
      Task from parent pid_ns             Child reaper
      copy_process()                      ..
        alloc_pid()                       ..
        ..                                zap_pid_ns_processes()
        ..                                  disable_pid_allocation()
        ..                                  read_lock(&tasklist_lock)
        ..                                  iterate over pids in pid_ns
        ..                                    kill tasks linked to pids
        ..                                  read_unlock(&tasklist_lock)
        write_lock_irq(&tasklist_lock);   ..
        attach_pid(p, PIDTYPE_PID);       ..
        ..                                ..
      
      So, just created task p won't receive SIGKILL signal,
      and the pid namespace will be in contradictory state.
      Only manual kill will help there, but does the userspace
      care about this? I suppose, the most users just inject
      a task into a pid namespace and wait a SIGCHLD from it.
      
      The patch fixes the problem. It simply checks for
      (pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
      We do it under the tasklist_lock, and can't skip
      PIDNS_HASH_ADDING as noted by Oleg:
      
      "zap_pid_ns_processes() does disable_pid_allocation()
      and then takes tasklist_lock to kill the whole namespace.
      Given that copy_process() checks PIDNS_HASH_ADDING
      under write_lock(tasklist) they can't race;
      if copy_process() takes this lock first, the new child will
      be killed, otherwise copy_process() can't miss
      the change in ->nr_hashed."
      
      If allocation is disabled, we just return -ENOMEM
      like it's made for such cases in alloc_pid().
      
      v2: Do not move disable_pid_allocation(), do not
      introduce a new variable in copy_process() and simplify
      the patch as suggested by Oleg Nesterov.
      Account the problem with double irq enabling
      found by Eric W. Biederman.
      
      Fixes: c876ad76 ("pidns: Stop pid allocation when init dies")
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Ingo Molnar <mingo@kernel.org>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
      CC: Michal Hocko <mhocko@suse.com>
      CC: Andy Lutomirski <luto@kernel.org>
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      CC: Andrei Vagin <avagin@openvz.org>
      CC: Cyrill Gorcunov <gorcunov@openvz.org>
      CC: Serge Hallyn <serge@hallyn.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ea2f891
    • Eric W. Biederman's avatar
      pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes · 6dc6a270
      Eric W. Biederman authored
      commit b9a985db upstream.
      
      The code can potentially sleep for an indefinite amount of time in
      zap_pid_ns_processes triggering the hung task timeout, and increasing
      the system average.  This is undesirable.  Sleep with a task state of
      TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
      undesirable side effects.
      
      Apparently under heavy load this has been allowing Chrome to trigger
      the hung time task timeout error and cause ChromeOS to reboot.
      Reported-by: default avatarVovo Yang <vovoy@google.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Fixes: 6347e900 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6dc6a270
    • Michael J. Ruhl's avatar
      IB/hfi1: Fix a subcontext memory leak · 5e40ac3f
      Michael J. Ruhl authored
      commit 224d71f9 upstream.
      
      The only context that frees user_exp_rcv data structures is the last
      context closed (from a sub-context set).  This leaks the allocations
      from the other sub-contexts.  Separate the common frees from the
      specific frees and call them at the appropriate time.
      
      Using KEDR to check for memory leaks we get:
      
      Before test:
      
      [leak_check] Possible leaks: 25
      
      After test:
      
      [leak_check] Possible leaks: 31  (6 leaked data structures)
      
      After patch applied (before and after test have the same value)
      
      [leak_check] Possible leaks: 25
      
      Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e40ac3f
    • Michael J. Ruhl's avatar
      IB/hfi1: Return an error on memory allocation failure · b894ea82
      Michael J. Ruhl authored
      commit 94679061 upstream.
      
      If the eager buffer allocation fails, it is necessary to return
      an error code.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b894ea82
    • Andreas Klinger's avatar
      IIO: bmp280-core.c: fix error in humidity calculation · dfb450b2
      Andreas Klinger authored
      commit ed3730c4 upstream.
      
      While calculating the compensation of the humidity there are negative values
      interpreted as unsigned because of unsigned variables used.  These values as
      well as the constants need to be casted to signed as indicated by the
      documentation of the sensor.
      Signed-off-by: default avatarAndreas Klinger <ak@it-klinger.de>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarMatt Ranostay <matt.ranostay@konsulko.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfb450b2
    • Pavel Roskin's avatar
      iio: dac: ad7303: fix channel description · a03176f9
      Pavel Roskin authored
      commit ce420fd4 upstream.
      
      realbits, storagebits and shift should be numbers, not ASCII characters.
      Signed-off-by: default avatarPavel Roskin <plroskin@gmail.com>
      Reviewed-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a03176f9
    • Bryant G. Ly's avatar
      ibmvscsis: Do not send aborted task response · 05a36277
      Bryant G. Ly authored
      commit 25e78531 upstream.
      
      The driver is sending a response to the actual scsi op that was
      aborted by an abort task TM, while LIO is sending a response to
      the abort task TM.
      
      ibmvscsis_tgt does not send the response to the client until
      release_cmd time. The reason for this was because if we did it
      at queue_status time, then the client would be free to reuse the
      tag for that command, but we're still using the tag until the
      command is released at release_cmd time, so we chose to delay
      sending the response until then. That then caused this issue, because
      release_cmd is always called, even if queue_status is not.
      
      SCSI spec says that the initiator that sends the abort task
      TM NEVER gets a response to the aborted op and with the current
      code it will send a response. Thus this fix will remove that response
      if the CMD_T_ABORTED && !CMD_T_TAS.
      
      Another case with a small timing window is the case where if LIO sends a
      TMR_DOES_NOT_EXIST, and the release_cmd callback is called for the TMR Abort
      cmd before the release_cmd for the (attemped) aborted cmd, then we need to
      ensure that we send the response for the (attempted) abort cmd to the client
      before we send the response for the TMR Abort cmd.
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05a36277
    • Johan Hovold's avatar
      of: fdt: add missing allocation-failure check · 9907c838
      Johan Hovold authored
      commit 49e67dd1 upstream.
      
      The memory allocator passed to __unflatten_device_tree() (e.g. a wrapped
      kzalloc) can fail so add the missing sanity check to avoid dereferencing
      a NULL pointer.
      
      Fixes: fe140423 ("of/flattree: Refactor unflatten_device_tree and add fdt_unflatten_tree")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9907c838
    • Tyrel Datwyler's avatar
      of: fix "/cpus" reference leak in of_numa_parse_cpu_nodes() · 80cdf206
      Tyrel Datwyler authored
      commit b8475cbe upstream.
      
      The call to of_find_node_by_path("/cpus") returns the cpus device_node
      with its reference count incremented. There is no matching of_node_put()
      call in of_numa_parse_cpu_nodes() which results in a leaked reference
      to the "/cpus" node.
      
      This patch adds an of_node_put() to release the reference.
      
      fixes: 298535c0 ("of, numa: Add NUMA of binding implementation.")
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80cdf206
    • Rob Herring's avatar
      of: fix sparse warning in of_pci_range_parser_one · ae5074ba
      Rob Herring authored
      commit eb310036 upstream.
      
      sparse gives the following warning for 'pci_space':
      
      ../drivers/of/address.c:266:26: warning: incorrect type in assignment (different base types)
      ../drivers/of/address.c:266:26:    expected unsigned int [unsigned] [usertype] pci_space
      ../drivers/of/address.c:266:26:    got restricted __be32 const [usertype] <noident>
      
      It appears that pci_space is only ever accessed on powerpc, so the endian
      swap is often not needed.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae5074ba
    • Takashi Iwai's avatar
      proc: Fix unbalanced hard link numbers · d10b21d6
      Takashi Iwai authored
      commit d66bb160 upstream.
      
      proc_create_mount_point() forgot to increase the parent's nlink, and
      it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
      less than expected.
      
      Fixes: eb6d38d5 ("proc: Allow creating permanently empty directories...")
      Reported-by: default avatarTristan Ye <tristan.ye@suse.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d10b21d6
    • Vaibhav Jain's avatar
      cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 168b2bfa
      Vaibhav Jain authored
      commit 4f58f0bf upstream.
      
      Fix a boundary condition where in some cases an eeh event that results
      in card reset isn't passed on to a driver attached to the virtual PCI
      device associated with a slice. This will happen in case when a slice
      attached device driver returns a value other than
      PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
      would result in an early return from cxl_pci_error_detected() and
      other drivers attached to other AFUs on the card wont be notified.
      
      The patch fixes this by making sure that all slice attached
      device-drivers are notified and the return values from
      error_detected() callback are aggregated in a scheme where request for
      'disconnect' trumps all and 'none' trumps 'need_reset'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      168b2bfa
    • Vaibhav Jain's avatar
      cxl: Force context lock during EEH flow · 39353129
      Vaibhav Jain authored
      commit ea9a26d1 upstream.
      
      During an eeh event when the cxl card is fenced and card sysfs attr
      perst_reloads_same_image is set following warning message is seen in the
      kernel logs:
      
        Adapter context unlocked with 0 active contexts
        ------------[ cut here ]------------
        WARNING: CPU: 12 PID: 627 at
        ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
      
      Even though this warning is harmless, it clutters the kernel log
      during an eeh event. This warning is triggered as the EEH callback
      cxl_pci_error_detected doesn't obtain a context-lock before forcibly
      detaching all active context and when context-lock is released during
      call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
      cxl_adapter_context_unlock is triggered.
      
      To fix this warning, we acquire the adapter context-lock via
      cxl_adapter_context_lock() in the eeh callback
      cxl_pci_error_detected() once all the virtual AFU PHBs are notified
      and their contexts detached. The context-lock is released in
      cxl_pci_slot_reset() after the adapter is successfully reconfigured
      and before the we call the slot_reset callback on slice attached
      device-drivers.
      
      Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Tested-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39353129
    • Gerd Hoffmann's avatar
      ohci-pci: add qemu quirk · fc6b678a
      Gerd Hoffmann authored
      commit 21a60f6e upstream.
      
      On a loaded virtualization host (dozen guests booting at the same time)
      it may happen that the ohci controller emulation doesn't manage to do
      timely frame processing, with the result that the io watchdog fires and
      considers the controller being dead, even though it's only the emulation
      being unusual slow due to the load peak.
      
      So, add a quirk for qemu and don't use the watchdog in case we figure we
      are running on emulated ohci.  The virtual ohci controller masquerades
      as apple ohci controller, but we can identify it by subsystem id.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc6b678a
    • Tobias Herzog's avatar
      cdc-acm: fix possible invalid access when processing notification · 809ae061
      Tobias Herzog authored
      commit 1bb9914e upstream.
      
      Notifications may only be 8 bytes long. Accessing the 9th and
      10th byte of unimplemented/unknown notifications may be insecure.
      Also check the length of known notifications before accessing anything
      behind the 8th byte.
      Signed-off-by: default avatarTobias Herzog <t-herzog@gmx.de>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      809ae061
    • David Rivshin's avatar
      gpio: omap: return error if requested debounce time is not possible · 198ab403
      David Rivshin authored
      commit 83977443 upstream.
      
      omap_gpio_debounce() does not validate that the requested debounce
      is within a range it can handle. Instead it lets the register value
      wrap silently, and always returns success.
      
      This can lead to all sorts of unexpected behavior, such as gpio_keys
      asking for a too-long debounce, but getting a very short debounce in
      practice.
      
      Fix this by returning -EINVAL if the requested value does not fit into
      the register field. If there is no debounce clock available at all,
      return -ENOTSUPP.
      
      Fixes: e85ec6c3 ("gpio: omap: fix omap2_set_gpio_debounce")
      Signed-off-by: default avatarDavid Rivshin <drivshin@allworx.com>
      Acked-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      198ab403
    • Ben Skeggs's avatar
      drm/nouveau/tmr: handle races with hw when updating the next alarm time · b77adf29
      Ben Skeggs authored
      commit 1b0f8438 upstream.
      
      If the time to the next alarm is short enough, we could race with HW and
      end up with an ~4 second delay until it triggers.
      
      Fix this by checking again after we update HW.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b77adf29
    • Ben Skeggs's avatar
      drm/nouveau/tmr: avoid processing completed alarms when adding a new one · 1ec3c712
      Ben Skeggs authored
      commit 330bdf62 upstream.
      
      The idea here was to avoid having to "manually" program the HW if there's
      a new earliest alarm.  This was lazy and bad, as it leads to loads of fun
      races between inter-related callers (ie. therm).
      
      Turns out, it's not so difficult after all.  Go figure ;)
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ec3c712
    • Ben Skeggs's avatar
      drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm · 6445a49a
      Ben Skeggs authored
      commit 9fc64667 upstream.
      
      At least therm/fantog "attempts" to work around this issue, which could
      lead to corruption of the pending alarm list.
      
      Fix it properly by not updating the timestamp without the lock held, or
      trying to add an already pending alarm to the pending alarm list....
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6445a49a
    • Ben Skeggs's avatar
      drm/nouveau/tmr: ack interrupt before processing alarms · 16e10490
      Ben Skeggs authored
      commit 3733bd8b upstream.
      
      Fixes a race where we can miss an alarm that triggers while we're already
      processing previous alarms.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16e10490
    • Ben Skeggs's avatar
      drm/nouveau/therm: remove ineffective workarounds for alarm bugs · e8ee6305
      Ben Skeggs authored
      commit e4311ee5 upstream.
      
      These were ineffective due to touching the list without the alarm lock,
      but should no longer be required.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8ee6305
    • Mario Kleiner's avatar
      drm/amdgpu: Add missing lb_vblank_lead_lines setup to DCE-6 path. · d1f006ef
      Mario Kleiner authored
      commit effaf848 upstream.
      
      This apparently got lost when implementing the new DCE-6 support
      and would cause failures in pageflip scheduling and timestamping.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1f006ef
    • Mario Kleiner's avatar
      drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations. · b334b349
      Mario Kleiner authored
      commit e190ed1e upstream.
      
      At dot clocks > approx. 250 Mhz, some of these calcs will overflow and
      cause miscalculation of latency watermarks, and for some overflows also
      divide-by-zero driver crash ("divide error: 0000 [#1] PREEMPT SMP" in
      "dce_v10_0_latency_watermark+0x12d/0x190").
      
      This zero-divide happened, e.g., on AMD Tonga Pro under DCE-10,
      on a Displayport panel when trying to set a video mode of 2560x1440
      at 165 Hz vrefresh with a dot clock of 635.540 Mhz.
      
      Refine calculations to avoid the overflows.
      
      Tested for DCE-10 with R9 380 Tonga + ASUS ROG PG279 panel.
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b334b349
    • Mario Kleiner's avatar
      drm/amdgpu: Make display watermark calculations more accurate · ebf3cf5b
      Mario Kleiner authored
      commit d63c277d upstream.
      
      Avoid big roundoff errors in scanline/hactive durations for
      high pixel clocks, especially for >= 500 Mhz, and thereby
      program more accurate display fifo watermarks.
      
      Implemented here for DCE 6,8,10,11.
      Successfully tested on DCE 10 with AMD R9 380 Tonga.
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebf3cf5b
    • Johan Hovold's avatar
      ath9k_htc: fix NULL-deref at probe · adc6647c
      Johan Hovold authored
      commit ebeb3667 upstream.
      
      Make sure to check the number of endpoints to avoid dereferencing a
      NULL-pointer or accessing memory beyond the endpoint array should a
      malicious device lack the expected endpoints.
      
      Fixes: 36bcce43 ("ath9k_htc: Handle storage devices")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adc6647c
    • Dmitry Tunin's avatar
      ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device · c39bafb9
      Dmitry Tunin authored
      commit 16ff1fb0 upstream.
      
      T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  7 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=ff MxPS=64 #Cfgs=  1
      P:  Vendor=1eda ProdID=2315 Rev=01.08
      S:  Manufacturer=ATHEROS
      S:  Product=USB2.0 WLAN
      S:  SerialNumber=12345
      C:  #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      Signed-off-by: default avatarDmitry Tunin <hanipouspilot@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c39bafb9
    • Martin Schwidefsky's avatar
      s390/cputime: fix incorrect system time · 768ae64b
      Martin Schwidefsky authored
      commit 07a63cbe upstream.
      
      git commit c5328901 "[S390] entry[64].S improvements" removed
      the update of the exit_timer lowcore field from the critical section
      cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
      blocks. If the PSW is updated by the critical section cleanup to point to
      user space again, the interrupt entry code will do a vtime calculation
      after the cleanup completed with an exit_timer value which has *not* been
      updated. Due to this incorrect system time deltas are calculated.
      
      If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
      or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
      time of the interrupt.
      Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      768ae64b
    • Michael Holzheu's avatar
      s390/kdump: Add final note · 8c5157c1
      Michael Holzheu authored
      commit dcc00b79 upstream.
      
      Since linux v3.14 with commit 38dfac84 ("vmcore: prevent PT_NOTE
      p_memsz overflow during header update") on s390 we get the following
      message in the kdump kernel:
      
        Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
        n_descsz=0x6b6b6b6b
      
      The reason for this is that we don't create a final zero note in
      the ELF header which the proc/vmcore code uses to find out the end
      of the notes section (see also kernel/kexec_core.c:final_note()).
      
      It still worked on s390 by chance because we (most of the time?) have the
      byte pattern 0x6b6b6b6b after the notes section which also makes the notes
      parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
      interpreded as note size:
      
        if ((real_sz + sz) > max_sz) {
                pr_warn("Warning: Exceeded p_memsz, dropping P ...);
                break;
        }
      
      So fix this and add the missing final note to the ELF header.
      We don't have to adjust the memory size for ELF header ("alloc_size")
      because the new ELF note still fits into the 0x1000 base memory.
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c5157c1
    • Richard Cochran's avatar
      regulator: tps65023: Fix inverted core enable logic. · c849b4fa
      Richard Cochran authored
      commit c90722b5 upstream.
      
      Commit 43530b69 ("regulator: Use
      regmap_read/write(), regmap_update_bits functions directly") intended
      to replace working inline helper functions with standard regmap
      calls.  However, it also inverted the set/clear logic of the "CORE ADJ
      Allowed" bit.  That patch was clearly never tested, since without that
      bit cleared, the core VDCDC1 voltage output does not react to I2C
      configuration changes.
      
      This patch fixes the issue by clearing the bit as in the original,
      correct implementation.  Note for stable back porting that, due to
      subsequent driver churn, this patch will not apply on every kernel
      version.
      
      Fixes: 43530b69 ("regulator: Use regmap_read/write(), regmap_update_bits functions directly")
      Signed-off-by: default avatarRichard Cochran <rcochran@linutronix.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c849b4fa
    • Wadim Egorov's avatar
      regulator: rk808: Fix RK818 LDO2 · 5b00d6c8
      Wadim Egorov authored
      commit 75f88115 upstream.
      
      Set the correct voltage select register for LDO2.
      Signed-off-by: default avatarWadim Egorov <w.egorov@phytec.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b00d6c8
    • Linus Torvalds's avatar
      x86: fix 32-bit case of __get_user_asm_u64() · ae382caa
      Linus Torvalds authored
      commit 33c9e972 upstream.
      
      The code to fetch a 64-bit value from user space was entirely buggered,
      and has been since the code was merged in early 2016 in commit
      b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
      kernels").
      
      Happily the buggered routine is almost certainly entirely unused, since
      the normal way to access user space memory is just with the non-inlined
      "get_user()", and the inlined version didn't even historically exist.
      
      The normal "get_user()" case is handled by external hand-written asm in
      arch/x86/lib/getuser.S that doesn't have either of these issues.
      
      There were two independent bugs in __get_user_asm_u64():
      
       - it still did the STAC/CLAC user space access marking, even though
         that is now done by the wrapper macros, see commit 11f1a4b9
         ("x86: reorganize SMAP handling in user space accesses").
      
         This didn't result in a semantic error, it just means that the
         inlined optimized version was hugely less efficient than the
         allegedly slower standard version, since the CLAC/STAC overhead is
         quite high on modern Intel CPU's.
      
       - the double register %eax/%edx was marked as an output, but the %eax
         part of it was touched early in the asm, and could thus clobber other
         inputs to the asm that gcc didn't expect it to touch.
      
         In particular, that meant that the generated code could look like
         this:
      
              mov    (%eax),%eax
              mov    0x4(%eax),%edx
      
         where the load of %edx obviously was _supposed_ to be from the 32-bit
         word that followed the source of %eax, but because %eax was
         overwritten by the first instruction, the source of %edx was
         basically random garbage.
      
      The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
      the 64-bit output as early-clobber to let gcc know that no inputs should
      alias with the output register.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae382caa
    • Wanpeng Li's avatar
      KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation · 54e38543
      Wanpeng Li authored
      commit cbfc6c91 upstream.
      
      Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
      
      - "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
        a read should be ignored) in guest can get a random number.
      - "rep insb" instruction to access PIT register port 0x43 can control memcpy()
        in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
        which will disclose the unimportant kernel memory in host but no crash.
      
      The similar test program below can reproduce the read out-of-bounds vulnerability:
      
      void hexdump(void *mem, unsigned int len)
      {
              unsigned int i, j;
      
              for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
              {
                      /* print offset */
                      if(i % HEXDUMP_COLS == 0)
                      {
                              printf("0x%06x: ", i);
                      }
      
                      /* print hex data */
                      if(i < len)
                      {
                              printf("%02x ", 0xFF & ((char*)mem)[i]);
                      }
                      else /* end of block, just aligning for ASCII dump */
                      {
                              printf("   ");
                      }
      
                      /* print ASCII dump */
                      if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                      {
                              for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                              {
                                      if(j >= len) /* end of block, not really printing */
                                      {
                                              putchar(' ');
                                      }
                                      else if(isprint(((char*)mem)[j])) /* printable char */
                                      {
                                              putchar(0xFF & ((char*)mem)[j]);
                                      }
                                      else /* other char */
                                      {
                                              putchar('.');
                                      }
                              }
                              putchar('\n');
                      }
              }
      }
      
      int main(void)
      {
      	int i;
      	if (iopl(3))
      	{
      		err(1, "set iopl unsuccessfully\n");
      		return -1;
      	}
      	static char buf[0x40];
      
      	/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x41, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x42, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x44, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x45, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x40 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x43 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      	return 0;
      }
      
      The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
      w/o clear after using which results in some random datas are left over in
      the buffer. Guest reads port 0x43 will be ignored since it is write only,
      however, the function kernel_pio() can't distigush this ignore from successfully
      reads data from device's ioport. There is no new data fill the buffer from
      port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
      the buffer to the guest unconditionally. This patch fixes it by clearing the
      buffer before in instruction emulation to avoid to grant guest the stale data
      in the buffer.
      
      In addition, string I/O is not supported for in kernel device. So there is no
      iteration to read ioport %RCX times for string I/O. The function kernel_pio()
      just reads one round, and then copy the io size * %RCX to the guest unconditionally,
      actually it copies the one round ioport data w/ other random datas which are left
      over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
      introducing the string I/O support for in kernel device in order to grant the right
      ioport datas to the guest.
      
      Before the patch:
      
      0x000000: fe 38 93 93 ff ff ab ab .8......
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      After the patch:
      
      0x000000: 1e 02 f8 00 ff ff ab ab ........
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: d2 e2 d2 df d2 db d2 d7 ........
      0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
      0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
      0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: 00 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 00 00 00 00 ........
      0x000018: 00 00 00 00 00 00 00 00 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Moguofang <moguofang@huawei.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54e38543
    • Wanpeng Li's avatar
      KVM: x86: Fix potential preemption when get the current kvmclock timestamp · c996ad75
      Wanpeng Li authored
      commit e2c2206a upstream.
      
       BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809
       caller is __this_cpu_preempt_check+0x13/0x20
       CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13
       Call Trace:
        dump_stack+0x99/0xce
        check_preemption_disabled+0xf5/0x100
        __this_cpu_preempt_check+0x13/0x20
        get_kvmclock_ns+0x6f/0x110 [kvm]
        get_time_ref_counter+0x5d/0x80 [kvm]
        kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
        ? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
        ? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
       RIP: 0033:0x7f9d164ed357
        ? __this_cpu_preempt_check+0x13/0x20
      
      This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
      CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.
      
      Safe access to per-CPU data requires a couple of constraints, though: the
      thread working with the data cannot be preempted and it cannot be migrated
      while it manipulates per-CPU variables. If the thread is preempted, the
      thread that replaces it could try to work with the same variables; migration
      to another CPU could also cause confusion. However there is no preemption
      disable when reads host per-CPU tsc rate to calculate the current kvmclock
      timestamp.
      
      This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
      __this_cpu_read() and rdtsc() are not preempted.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c996ad75