1. 25 May, 2017 40 commits
    • Vaibhav Jain's avatar
      cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 525d5ef5
      Vaibhav Jain authored
      commit 4f58f0bf upstream.
      
      Fix a boundary condition where in some cases an eeh event that results
      in card reset isn't passed on to a driver attached to the virtual PCI
      device associated with a slice. This will happen in case when a slice
      attached device driver returns a value other than
      PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
      would result in an early return from cxl_pci_error_detected() and
      other drivers attached to other AFUs on the card wont be notified.
      
      The patch fixes this by making sure that all slice attached
      device-drivers are notified and the return values from
      error_detected() callback are aggregated in a scheme where request for
      'disconnect' trumps all and 'none' trumps 'need_reset'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      525d5ef5
    • Vaibhav Jain's avatar
      cxl: Force context lock during EEH flow · 4a0e8757
      Vaibhav Jain authored
      commit ea9a26d1 upstream.
      
      During an eeh event when the cxl card is fenced and card sysfs attr
      perst_reloads_same_image is set following warning message is seen in the
      kernel logs:
      
        Adapter context unlocked with 0 active contexts
        ------------[ cut here ]------------
        WARNING: CPU: 12 PID: 627 at
        ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
      
      Even though this warning is harmless, it clutters the kernel log
      during an eeh event. This warning is triggered as the EEH callback
      cxl_pci_error_detected doesn't obtain a context-lock before forcibly
      detaching all active context and when context-lock is released during
      call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
      cxl_adapter_context_unlock is triggered.
      
      To fix this warning, we acquire the adapter context-lock via
      cxl_adapter_context_lock() in the eeh callback
      cxl_pci_error_detected() once all the virtual AFU PHBs are notified
      and their contexts detached. The context-lock is released in
      cxl_pci_slot_reset() after the adapter is successfully reconfigured
      and before the we call the slot_reset callback on slice attached
      device-drivers.
      
      Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Tested-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a0e8757
    • Gerd Hoffmann's avatar
      ohci-pci: add qemu quirk · 6c58e144
      Gerd Hoffmann authored
      commit 21a60f6e upstream.
      
      On a loaded virtualization host (dozen guests booting at the same time)
      it may happen that the ohci controller emulation doesn't manage to do
      timely frame processing, with the result that the io watchdog fires and
      considers the controller being dead, even though it's only the emulation
      being unusual slow due to the load peak.
      
      So, add a quirk for qemu and don't use the watchdog in case we figure we
      are running on emulated ohci.  The virtual ohci controller masquerades
      as apple ohci controller, but we can identify it by subsystem id.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c58e144
    • Tobias Herzog's avatar
      cdc-acm: fix possible invalid access when processing notification · a83f3099
      Tobias Herzog authored
      commit 1bb9914e upstream.
      
      Notifications may only be 8 bytes long. Accessing the 9th and
      10th byte of unimplemented/unknown notifications may be insecure.
      Also check the length of known notifications before accessing anything
      behind the 8th byte.
      Signed-off-by: default avatarTobias Herzog <t-herzog@gmx.de>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a83f3099
    • David Rivshin's avatar
      gpio: omap: return error if requested debounce time is not possible · 9541621a
      David Rivshin authored
      commit 83977443 upstream.
      
      omap_gpio_debounce() does not validate that the requested debounce
      is within a range it can handle. Instead it lets the register value
      wrap silently, and always returns success.
      
      This can lead to all sorts of unexpected behavior, such as gpio_keys
      asking for a too-long debounce, but getting a very short debounce in
      practice.
      
      Fix this by returning -EINVAL if the requested value does not fit into
      the register field. If there is no debounce clock available at all,
      return -ENOTSUPP.
      
      Fixes: e85ec6c3 ("gpio: omap: fix omap2_set_gpio_debounce")
      Signed-off-by: default avatarDavid Rivshin <drivshin@allworx.com>
      Acked-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9541621a
    • Ben Skeggs's avatar
      drm/nouveau/tmr: handle races with hw when updating the next alarm time · 33f379db
      Ben Skeggs authored
      commit 1b0f8438 upstream.
      
      If the time to the next alarm is short enough, we could race with HW and
      end up with an ~4 second delay until it triggers.
      
      Fix this by checking again after we update HW.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33f379db
    • Ben Skeggs's avatar
      drm/nouveau/tmr: avoid processing completed alarms when adding a new one · 53b8e320
      Ben Skeggs authored
      commit 330bdf62 upstream.
      
      The idea here was to avoid having to "manually" program the HW if there's
      a new earliest alarm.  This was lazy and bad, as it leads to loads of fun
      races between inter-related callers (ie. therm).
      
      Turns out, it's not so difficult after all.  Go figure ;)
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53b8e320
    • Ben Skeggs's avatar
      drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm · 952030b8
      Ben Skeggs authored
      commit 9fc64667 upstream.
      
      At least therm/fantog "attempts" to work around this issue, which could
      lead to corruption of the pending alarm list.
      
      Fix it properly by not updating the timestamp without the lock held, or
      trying to add an already pending alarm to the pending alarm list....
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      952030b8
    • Ben Skeggs's avatar
      drm/nouveau/tmr: ack interrupt before processing alarms · 4c168033
      Ben Skeggs authored
      commit 3733bd8b upstream.
      
      Fixes a race where we can miss an alarm that triggers while we're already
      processing previous alarms.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c168033
    • Ben Skeggs's avatar
      drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes · a5e41c9e
      Ben Skeggs authored
      commit e6db9579 upstream.
      
      The DRM core used to only call prepare_fb/cleanup_fb() when a plane's
      framebuffer changed, which achieved the desired effect.
      
      It's apparently now up to the driver to decide on its own.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5e41c9e
    • Ben Skeggs's avatar
      drm/nouveau/kms/nv50: fix source-rect-only plane updates · 4aecf3a7
      Ben Skeggs authored
      commit 36601c2b upstream.
      
      This "optimisation" (which was originally meant to skip updating cursor
      settings in the core channel on position-only updates) turned out to be
      pointless in the final design of the code before it was merged.
      
      Remove it completely, as it breaks other cases.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aecf3a7
    • Ben Skeggs's avatar
      drm/nouveau/therm: remove ineffective workarounds for alarm bugs · 2138faf0
      Ben Skeggs authored
      commit e4311ee5 upstream.
      
      These were ineffective due to touching the list without the alarm lock,
      but should no longer be required.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2138faf0
    • Mario Kleiner's avatar
      drm/amdgpu: Add missing lb_vblank_lead_lines setup to DCE-6 path. · 21d3947b
      Mario Kleiner authored
      commit effaf848 upstream.
      
      This apparently got lost when implementing the new DCE-6 support
      and would cause failures in pageflip scheduling and timestamping.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21d3947b
    • Mario Kleiner's avatar
      drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations. · 3f0db81c
      Mario Kleiner authored
      commit e190ed1e upstream.
      
      At dot clocks > approx. 250 Mhz, some of these calcs will overflow and
      cause miscalculation of latency watermarks, and for some overflows also
      divide-by-zero driver crash ("divide error: 0000 [#1] PREEMPT SMP" in
      "dce_v10_0_latency_watermark+0x12d/0x190").
      
      This zero-divide happened, e.g., on AMD Tonga Pro under DCE-10,
      on a Displayport panel when trying to set a video mode of 2560x1440
      at 165 Hz vrefresh with a dot clock of 635.540 Mhz.
      
      Refine calculations to avoid the overflows.
      
      Tested for DCE-10 with R9 380 Tonga + ASUS ROG PG279 panel.
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f0db81c
    • Mario Kleiner's avatar
      drm/amdgpu: Make display watermark calculations more accurate · de2fa45a
      Mario Kleiner authored
      commit d63c277d upstream.
      
      Avoid big roundoff errors in scanline/hactive durations for
      high pixel clocks, especially for >= 500 Mhz, and thereby
      program more accurate display fifo watermarks.
      
      Implemented here for DCE 6,8,10,11.
      Successfully tested on DCE 10 with AMD R9 380 Tonga.
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de2fa45a
    • Johan Hovold's avatar
      ath9k_htc: fix NULL-deref at probe · fab44023
      Johan Hovold authored
      commit ebeb3667 upstream.
      
      Make sure to check the number of endpoints to avoid dereferencing a
      NULL-pointer or accessing memory beyond the endpoint array should a
      malicious device lack the expected endpoints.
      
      Fixes: 36bcce43 ("ath9k_htc: Handle storage devices")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fab44023
    • Dmitry Tunin's avatar
      ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device · 43068b45
      Dmitry Tunin authored
      commit 16ff1fb0 upstream.
      
      T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  7 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=ff MxPS=64 #Cfgs=  1
      P:  Vendor=1eda ProdID=2315 Rev=01.08
      S:  Manufacturer=ATHEROS
      S:  Product=USB2.0 WLAN
      S:  SerialNumber=12345
      C:  #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      Signed-off-by: default avatarDmitry Tunin <hanipouspilot@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43068b45
    • Martin Schwidefsky's avatar
      s390/cputime: fix incorrect system time · 82e31de0
      Martin Schwidefsky authored
      commit 07a63cbe upstream.
      
      git commit c5328901 "[S390] entry[64].S improvements" removed
      the update of the exit_timer lowcore field from the critical section
      cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
      blocks. If the PSW is updated by the critical section cleanup to point to
      user space again, the interrupt entry code will do a vtime calculation
      after the cleanup completed with an exit_timer value which has *not* been
      updated. Due to this incorrect system time deltas are calculated.
      
      If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
      or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
      time of the interrupt.
      Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82e31de0
    • Michael Holzheu's avatar
      s390/kdump: Add final note · 5d1adbaf
      Michael Holzheu authored
      commit dcc00b79 upstream.
      
      Since linux v3.14 with commit 38dfac84 ("vmcore: prevent PT_NOTE
      p_memsz overflow during header update") on s390 we get the following
      message in the kdump kernel:
      
        Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
        n_descsz=0x6b6b6b6b
      
      The reason for this is that we don't create a final zero note in
      the ELF header which the proc/vmcore code uses to find out the end
      of the notes section (see also kernel/kexec_core.c:final_note()).
      
      It still worked on s390 by chance because we (most of the time?) have the
      byte pattern 0x6b6b6b6b after the notes section which also makes the notes
      parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
      interpreded as note size:
      
        if ((real_sz + sz) > max_sz) {
                pr_warn("Warning: Exceeded p_memsz, dropping P ...);
                break;
        }
      
      So fix this and add the missing final note to the ELF header.
      We don't have to adjust the memory size for ELF header ("alloc_size")
      because the new ELF note still fits into the 0x1000 base memory.
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d1adbaf
    • Richard Cochran's avatar
      regulator: tps65023: Fix inverted core enable logic. · d2cc3a8c
      Richard Cochran authored
      commit c90722b5 upstream.
      
      Commit 43530b69 ("regulator: Use
      regmap_read/write(), regmap_update_bits functions directly") intended
      to replace working inline helper functions with standard regmap
      calls.  However, it also inverted the set/clear logic of the "CORE ADJ
      Allowed" bit.  That patch was clearly never tested, since without that
      bit cleared, the core VDCDC1 voltage output does not react to I2C
      configuration changes.
      
      This patch fixes the issue by clearing the bit as in the original,
      correct implementation.  Note for stable back porting that, due to
      subsequent driver churn, this patch will not apply on every kernel
      version.
      
      Fixes: 43530b69 ("regulator: Use regmap_read/write(), regmap_update_bits functions directly")
      Signed-off-by: default avatarRichard Cochran <rcochran@linutronix.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2cc3a8c
    • Wadim Egorov's avatar
      regulator: rk808: Fix RK818 LDO2 · fde4f790
      Wadim Egorov authored
      commit 75f88115 upstream.
      
      Set the correct voltage select register for LDO2.
      Signed-off-by: default avatarWadim Egorov <w.egorov@phytec.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fde4f790
    • Linus Torvalds's avatar
      x86: fix 32-bit case of __get_user_asm_u64() · efb209c8
      Linus Torvalds authored
      commit 33c9e972 upstream.
      
      The code to fetch a 64-bit value from user space was entirely buggered,
      and has been since the code was merged in early 2016 in commit
      b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
      kernels").
      
      Happily the buggered routine is almost certainly entirely unused, since
      the normal way to access user space memory is just with the non-inlined
      "get_user()", and the inlined version didn't even historically exist.
      
      The normal "get_user()" case is handled by external hand-written asm in
      arch/x86/lib/getuser.S that doesn't have either of these issues.
      
      There were two independent bugs in __get_user_asm_u64():
      
       - it still did the STAC/CLAC user space access marking, even though
         that is now done by the wrapper macros, see commit 11f1a4b9
         ("x86: reorganize SMAP handling in user space accesses").
      
         This didn't result in a semantic error, it just means that the
         inlined optimized version was hugely less efficient than the
         allegedly slower standard version, since the CLAC/STAC overhead is
         quite high on modern Intel CPU's.
      
       - the double register %eax/%edx was marked as an output, but the %eax
         part of it was touched early in the asm, and could thus clobber other
         inputs to the asm that gcc didn't expect it to touch.
      
         In particular, that meant that the generated code could look like
         this:
      
              mov    (%eax),%eax
              mov    0x4(%eax),%edx
      
         where the load of %edx obviously was _supposed_ to be from the 32-bit
         word that followed the source of %eax, but because %eax was
         overwritten by the first instruction, the source of %edx was
         basically random garbage.
      
      The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
      the 64-bit output as early-clobber to let gcc know that no inputs should
      alias with the output register.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efb209c8
    • Wanpeng Li's avatar
      KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation · 5bbaf88d
      Wanpeng Li authored
      commit cbfc6c91 upstream.
      
      Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
      
      - "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
        a read should be ignored) in guest can get a random number.
      - "rep insb" instruction to access PIT register port 0x43 can control memcpy()
        in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
        which will disclose the unimportant kernel memory in host but no crash.
      
      The similar test program below can reproduce the read out-of-bounds vulnerability:
      
      void hexdump(void *mem, unsigned int len)
      {
              unsigned int i, j;
      
              for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
              {
                      /* print offset */
                      if(i % HEXDUMP_COLS == 0)
                      {
                              printf("0x%06x: ", i);
                      }
      
                      /* print hex data */
                      if(i < len)
                      {
                              printf("%02x ", 0xFF & ((char*)mem)[i]);
                      }
                      else /* end of block, just aligning for ASCII dump */
                      {
                              printf("   ");
                      }
      
                      /* print ASCII dump */
                      if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                      {
                              for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                              {
                                      if(j >= len) /* end of block, not really printing */
                                      {
                                              putchar(' ');
                                      }
                                      else if(isprint(((char*)mem)[j])) /* printable char */
                                      {
                                              putchar(0xFF & ((char*)mem)[j]);
                                      }
                                      else /* other char */
                                      {
                                              putchar('.');
                                      }
                              }
                              putchar('\n');
                      }
              }
      }
      
      int main(void)
      {
      	int i;
      	if (iopl(3))
      	{
      		err(1, "set iopl unsuccessfully\n");
      		return -1;
      	}
      	static char buf[0x40];
      
      	/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x41, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x42, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x44, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("mov $0x45, %rdx;");
      	asm volatile ("in %dx,%al;");
      	asm volatile ("stosb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x40 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x40, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      
      	/* ins port 0x43 */
      
      	memset(buf, 0xab, sizeof(buf));
      
      	asm volatile("push %rdi;");
      	asm volatile("mov %0, %%rdi;"::"q"(buf));
      
      	asm volatile ("mov $0x20, %rcx;");
      	asm volatile ("mov $0x43, %rdx;");
      	asm volatile ("rep insb;");
      
      	asm volatile ("pop %rdi;");
      	hexdump(buf, 0x40);
      
      	printf("\n");
      	return 0;
      }
      
      The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
      w/o clear after using which results in some random datas are left over in
      the buffer. Guest reads port 0x43 will be ignored since it is write only,
      however, the function kernel_pio() can't distigush this ignore from successfully
      reads data from device's ioport. There is no new data fill the buffer from
      port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
      the buffer to the guest unconditionally. This patch fixes it by clearing the
      buffer before in instruction emulation to avoid to grant guest the stale data
      in the buffer.
      
      In addition, string I/O is not supported for in kernel device. So there is no
      iteration to read ioport %RCX times for string I/O. The function kernel_pio()
      just reads one round, and then copy the io size * %RCX to the guest unconditionally,
      actually it copies the one round ioport data w/ other random datas which are left
      over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
      introducing the string I/O support for in kernel device in order to grant the right
      ioport datas to the guest.
      
      Before the patch:
      
      0x000000: fe 38 93 93 ff ff ab ab .8......
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: f6 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
      0x000018: 30 30 20 33 20 20 20 20 00 3
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      After the patch:
      
      0x000000: 1e 02 f8 00 ff ff ab ab ........
      0x000008: ab ab ab ab ab ab ab ab ........
      0x000010: ab ab ab ab ab ab ab ab ........
      0x000018: ab ab ab ab ab ab ab ab ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: d2 e2 d2 df d2 db d2 d7 ........
      0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
      0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
      0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      
      0x000000: 00 00 00 00 00 00 00 00 ........
      0x000008: 00 00 00 00 00 00 00 00 ........
      0x000010: 00 00 00 00 00 00 00 00 ........
      0x000018: 00 00 00 00 00 00 00 00 ........
      0x000020: ab ab ab ab ab ab ab ab ........
      0x000028: ab ab ab ab ab ab ab ab ........
      0x000030: ab ab ab ab ab ab ab ab ........
      0x000038: ab ab ab ab ab ab ab ab ........
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Moguofang <moguofang@huawei.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bbaf88d
    • Wanpeng Li's avatar
      KVM: x86: Fix potential preemption when get the current kvmclock timestamp · b67dcf8c
      Wanpeng Li authored
      commit e2c2206a upstream.
      
       BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809
       caller is __this_cpu_preempt_check+0x13/0x20
       CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13
       Call Trace:
        dump_stack+0x99/0xce
        check_preemption_disabled+0xf5/0x100
        __this_cpu_preempt_check+0x13/0x20
        get_kvmclock_ns+0x6f/0x110 [kvm]
        get_time_ref_counter+0x5d/0x80 [kvm]
        kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
        ? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
        ? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
       RIP: 0033:0x7f9d164ed357
        ? __this_cpu_preempt_check+0x13/0x20
      
      This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
      CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.
      
      Safe access to per-CPU data requires a couple of constraints, though: the
      thread working with the data cannot be preempted and it cannot be migrated
      while it manipulates per-CPU variables. If the thread is preempted, the
      thread that replaces it could try to work with the same variables; migration
      to another CPU could also cause confusion. However there is no preemption
      disable when reads host per-CPU tsc rate to calculate the current kvmclock
      timestamp.
      
      This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
      __this_cpu_read() and rdtsc() are not preempted.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b67dcf8c
    • Wanpeng Li's avatar
      KVM: x86: Fix load damaged SSEx MXCSR register · 8223e8f9
      Wanpeng Li authored
      commit a575813b upstream.
      
      Reported by syzkaller:
      
         BUG: unable to handle kernel paging request at ffffffffc07f6a2e
         IP: report_bug+0x94/0x120
         PGD 348e12067
         P4D 348e12067
         PUD 348e14067
         PMD 3cbd84067
         PTE 80000003f7e87161
      
         Oops: 0003 [#1] SMP
         CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
         task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
         RIP: 0010:report_bug+0x94/0x120
         RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
          do_trap+0x156/0x170
          do_error_trap+0xa3/0x170
          ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? mark_held_locks+0x79/0xa0
          ? retint_kernel+0x10/0x10
          ? trace_hardirqs_off_thunk+0x1a/0x1c
          do_invalid_op+0x20/0x30
          invalid_op+0x1e/0x30
         RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
          ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
          kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
          kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
          ? sched_clock+0x13/0x20
          ? __do_page_fault+0x2a0/0x550
          do_vfs_ioctl+0xa4/0x700
          ? up_read+0x1f/0x40
          ? __do_page_fault+0x2a0/0x550
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
      
      SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
      a 1 to any of these bits will cause a general-protection exception(#GP) to be
      generated". The syzkaller forks' testcase overrides xsave area w/ random values
      and steps on the reserved bits of MXCSR register. The damaged MXCSR register
      values of guest will be restored to SSEx MXCSR register before vmentry. This
      patch fixes it by catching userspace override MXCSR register reserved bits w/
      random values and bails out immediately.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8223e8f9
    • Daniel Glöckner's avatar
      ima: accept previously set IMA_NEW_FILE · efa8cd1e
      Daniel Glöckner authored
      commit 1ac202e9 upstream.
      
      Modifying the attributes of a file makes ima_inode_post_setattr reset
      the IMA cache flags. So if the file, which has just been created,
      is opened a second time before the first file descriptor is closed,
      verification fails since the security.ima xattr has not been written
      yet. We therefore have to look at the IMA_NEW_FILE even if the file
      already existed.
      
      With this patch there should no longer be an error when cat tries to
      open testfile:
      
      $ rm -f testfile
      $ ( echo test >&3 ; touch testfile ; cat testfile ) 3>testfile
      
      A file being new is no reason to accept that it is missing a digital
      signature demanded by the policy.
      Signed-off-by: default avatarDaniel Glöckner <dg@emlix.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efa8cd1e
    • Brian Norris's avatar
      mwifiex: pcie: fix cmd_buf use-after-free in remove/reset · d5811285
      Brian Norris authored
      commit 3c8cb9ad upstream.
      
      Command buffers (skb's) are allocated by the main driver, and freed upon
      the last use. That last use is often in mwifiex_free_cmd_buffer(). In
      the meantime, if the command buffer gets used by the PCI driver, we map
      it as DMA-able, and store the mapping information in the 'cb' memory.
      
      However, if a command was in-flight when resetting the device (and
      therefore was still mapped), we don't get a chance to unmap this memory
      until after the core has cleaned up its command handling.
      
      Let's keep a refcount within the PCI driver, so we ensure the memory
      only gets freed after we've finished unmapping it.
      
      Noticed by KASAN when forcing a reset via:
      
        echo 1 > /sys/bus/pci/.../reset
      
      The same code path can presumably be exercised in remove() and
      shutdown().
      
      [  205.390377] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
      [  205.400393] ==================================================================
      [  205.407719] BUG: KASAN: use-after-free in mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie] at addr ffffffc0ad471b28
      [  205.419040] Read of size 16 by task bash/1913
      [  205.423421] =============================================================================
      [  205.431625] BUG skbuff_head_cache (Tainted: G    B          ): kasan: bad access detected
      [  205.439815] -----------------------------------------------------------------------------
      [  205.439815]
      [  205.449534] INFO: Allocated in __build_skb+0x48/0x114 age=1311 cpu=4 pid=1913
      [  205.456709] 	alloc_debug_processing+0x124/0x178
      [  205.461282] 	___slab_alloc.constprop.58+0x528/0x608
      [  205.466196] 	__slab_alloc.isra.54.constprop.57+0x44/0x54
      [  205.471542] 	kmem_cache_alloc+0xcc/0x278
      [  205.475497] 	__build_skb+0x48/0x114
      [  205.479019] 	__netdev_alloc_skb+0xe0/0x170
      [  205.483244] 	mwifiex_alloc_cmd_buffer+0x68/0xdc [mwifiex]
      [  205.488759] 	mwifiex_init_fw+0x40/0x6cc [mwifiex]
      [  205.493584] 	_mwifiex_fw_dpc+0x158/0x520 [mwifiex]
      [  205.498491] 	mwifiex_reinit_sw+0x2c4/0x398 [mwifiex]
      [  205.503510] 	mwifiex_pcie_reset_notify+0x114/0x15c [mwifiex_pcie]
      [  205.509643] 	pci_reset_notify+0x5c/0x6c
      [  205.513519] 	pci_reset_function+0x6c/0x7c
      [  205.517567] 	reset_store+0x68/0x98
      [  205.521003] 	dev_attr_store+0x54/0x60
      [  205.524705] 	sysfs_kf_write+0x9c/0xb0
      [  205.528413] INFO: Freed in __kfree_skb+0xb0/0xbc age=131 cpu=4 pid=1913
      [  205.535064] 	free_debug_processing+0x264/0x370
      [  205.539550] 	__slab_free+0x84/0x40c
      [  205.543075] 	kmem_cache_free+0x1c8/0x2a0
      [  205.547030] 	__kfree_skb+0xb0/0xbc
      [  205.550465] 	consume_skb+0x164/0x178
      [  205.554079] 	__dev_kfree_skb_any+0x58/0x64
      [  205.558304] 	mwifiex_free_cmd_buffer+0xa0/0x158 [mwifiex]
      [  205.563817] 	mwifiex_shutdown_drv+0x578/0x5c4 [mwifiex]
      [  205.569164] 	mwifiex_shutdown_sw+0x178/0x310 [mwifiex]
      [  205.574353] 	mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
      [  205.580398] 	pci_reset_notify+0x5c/0x6c
      [  205.584274] 	pci_dev_save_and_disable+0x24/0x6c
      [  205.588837] 	pci_reset_function+0x30/0x7c
      [  205.592885] 	reset_store+0x68/0x98
      [  205.596324] 	dev_attr_store+0x54/0x60
      [  205.600017] 	sysfs_kf_write+0x9c/0xb0
      ...
      [  205.800488] Call trace:
      [  205.802980] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
      [  205.808415] [<ffffffc00020a96c>] show_stack+0x20/0x28
      [  205.813506] [<ffffffc0005d020c>] dump_stack+0xa4/0xcc
      [  205.818598] [<ffffffc0003be44c>] print_trailer+0x158/0x168
      [  205.824120] [<ffffffc0003be5f0>] object_err+0x4c/0x5c
      [  205.829210] [<ffffffc0003c45bc>] kasan_report+0x334/0x500
      [  205.834641] [<ffffffc0003c3994>] check_memory_region+0x20/0x14c
      [  205.840593] [<ffffffc0003c3b14>] __asan_loadN+0x14/0x1c
      [  205.845879] [<ffffffbffc46171c>] mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie]
      [  205.854282] [<ffffffbffc461864>] mwifiex_pcie_delete_cmdrsp_buf+0x94/0xa8 [mwifiex_pcie]
      [  205.862421] [<ffffffbffc462028>] mwifiex_pcie_free_buffers+0x11c/0x158 [mwifiex_pcie]
      [  205.870302] [<ffffffbffc4620d4>] mwifiex_pcie_down_dev+0x70/0x80 [mwifiex_pcie]
      [  205.877736] [<ffffffbffc1397a8>] mwifiex_shutdown_sw+0x190/0x310 [mwifiex]
      [  205.884658] [<ffffffbffc4606b4>] mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
      [  205.892446] [<ffffffc000635f54>] pci_reset_notify+0x5c/0x6c
      [  205.898048] [<ffffffc00063a044>] pci_dev_save_and_disable+0x24/0x6c
      [  205.904350] [<ffffffc00063cf0c>] pci_reset_function+0x30/0x7c
      [  205.910134] [<ffffffc000641118>] reset_store+0x68/0x98
      [  205.915312] [<ffffffc000771588>] dev_attr_store+0x54/0x60
      [  205.920750] [<ffffffc00046f53c>] sysfs_kf_write+0x9c/0xb0
      [  205.926182] [<ffffffc00046dfb0>] kernfs_fop_write+0x184/0x1f8
      [  205.931963] [<ffffffc0003d64f4>] __vfs_write+0x6c/0x17c
      [  205.937221] [<ffffffc0003d7164>] vfs_write+0xf0/0x1c4
      [  205.942310] [<ffffffc0003d7da0>] SyS_write+0x78/0xd8
      [  205.947312] [<ffffffc000204634>] el0_svc_naked+0x24/0x28
      ...
      [  205.998268] ==================================================================
      
      This bug has been around in different forms for a while. It was sort of
      noticed in commit 955ab095 ("mwifiex: Do not kfree cmd buf while
      unregistering PCIe"), but it just fixed the double-free, without
      acknowledging the potential for use-after-free.
      
      Fixes: fc331460 ("mwifiex: use pci_alloc/free_consistent APIs for PCIe")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5811285
    • Brian Norris's avatar
      mwifiex: MAC randomization should not be persistent · a32a0c82
      Brian Norris authored
      commit 7e2f18f0 upstream.
      
      nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
      request that should be randomized; the absence of such a flag means we
      should not randomize. However, mwifiex was stashing the latest
      randomization request and *always* using it for future scans, even those
      that didn't set the flag.
      
      Let's zero out the randomization info whenever we get a scan request
      without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
      priv->random_mac entirely (and plumb the randomization MAC properly
      through the call sequence), but the spaghetti is a little difficult to
      unravel here for me.
      
      Fixes: c2a8f0ff ("mwifiex: support random MAC address for scanning")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a32a0c82
    • Larry Finger's avatar
      rtlwifi: rtl8821ae: setup 8812ae RFE according to device type · cf6df2bc
      Larry Finger authored
      commit 46cfa214 upstream.
      
      Current channel switch implementation sets 8812ae RFE reg value assuming
      that device always has type 2.
      
      Extend possible RFE types set and write corresponding reg values.
      
      Source for new code is
      http://dlcdnet.asus.com/pub/ASUS/wireless/PCE-AC51/DR_PCE_AC51_20232801152016.zipSigned-off-by: default avatarMaxim Samoylov <max7255@gmail.com>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Yan-Hsuan Chuang <yhchuang@realtek.com>
      Cc: Pkshih <pkshih@realtek.com>
      Cc: Birming Chiu <birming@realtek.com>
      Cc: Shaofu <shaofu@realtek.com>
      Cc: Steven Ting <steventing@realtek.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf6df2bc
    • NeilBrown's avatar
      md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop · 699ae096
      NeilBrown authored
      commit 065e519e upstream.
      
      if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
      be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
      needs to be cleared in md_ioctl after any call to md_set_readonly()
      or do_md_stop().
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag")
      Signed-off-by: default avatarZhilong Liu <zlliu@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      699ae096
    • Dennis Yang's avatar
      md: update slab_cache before releasing new stripes when stripes resizing · 1800fde3
      Dennis Yang authored
      commit 583da48e upstream.
      
      When growing raid5 device on machine with small memory, there is chance that
      mdadm will be killed and the following bug report can be observed. The same
      bug could also be reproduced in linux-4.10.6.
      
      [57600.075774] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [57600.083796] IP: [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
      [57600.110378] PGD 421cf067 PUD 4442d067 PMD 0
      [57600.114678] Oops: 0002 [#1] SMP
      [57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P           O    4.2.8 #1
      [57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013
      [57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000
      [57600.204963] RIP: 0010:[<ffffffff81a6aa87>]  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
      [57600.213057] RSP: 0018:ffff880043073810  EFLAGS: 00010046
      [57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0
      [57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000
      [57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282
      [57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838
      [57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00
      [57600.253999] FS:  00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
      [57600.262078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0
      [57600.274942] Stack:
      [57600.276949]  ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f
      [57600.284383]  ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98
      [57600.291820]  ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968
      [57600.299254] Call Trace:
      [57600.301698]  [<ffffffff8114ee35>] ? cache_flusharray+0x35/0xe0
      [57600.307523]  [<ffffffff81119043>] ? __page_cache_release+0x23/0x110
      [57600.313779]  [<ffffffff8114eb53>] kmem_cache_free+0x63/0xc0
      [57600.319344]  [<ffffffff81579942>] drop_one_stripe+0x62/0x90
      [57600.324915]  [<ffffffff81579b5b>] raid5_cache_scan+0x8b/0xb0
      [57600.330563]  [<ffffffff8111b98a>] shrink_slab.part.36+0x19a/0x250
      [57600.336650]  [<ffffffff8111e38c>] shrink_zone+0x23c/0x250
      [57600.342039]  [<ffffffff8111e4f3>] do_try_to_free_pages+0x153/0x420
      [57600.348210]  [<ffffffff8111e851>] try_to_free_pages+0x91/0xa0
      [57600.353959]  [<ffffffff811145b1>] __alloc_pages_nodemask+0x4d1/0x8b0
      [57600.360303]  [<ffffffff8157a30b>] check_reshape+0x62b/0x770
      [57600.365866]  [<ffffffff8157a4a5>] raid5_check_reshape+0x55/0xa0
      [57600.371778]  [<ffffffff81583df7>] update_raid_disks+0xc7/0x110
      [57600.377604]  [<ffffffff81592b73>] md_ioctl+0xd83/0x1b10
      [57600.382827]  [<ffffffff81385380>] blkdev_ioctl+0x170/0x690
      [57600.388307]  [<ffffffff81195238>] block_ioctl+0x38/0x40
      [57600.393525]  [<ffffffff811731c5>] do_vfs_ioctl+0x2b5/0x480
      [57600.399010]  [<ffffffff8115e07b>] ? vfs_write+0x14b/0x1f0
      [57600.404400]  [<ffffffff811733cc>] SyS_ioctl+0x3c/0x70
      [57600.409447]  [<ffffffff81a6ad97>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d
      [57600.435460] RIP  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
      [57600.441208]  RSP <ffff880043073810>
      [57600.444690] CR2: 0000000000000000
      [57600.448000] ---[ end trace cbc6b5cc4bf9831d ]---
      
      The problem is that resize_stripes() releases new stripe_heads before assigning new
      slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
      after resize_stripes() starting releasing new stripes but right before new slab cache
      being assigned, it is possible that these new stripe_heads will be freed with the old
      slab_cache which was already been destoryed and that triggers this bug.
      Signed-off-by: default avatarDennis Yang <dennisyang@qnap.com>
      Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1800fde3
    • Joe Thornber's avatar
      dm space map disk: fix some book keeping in the disk space map · d7252903
      Joe Thornber authored
      commit 0377a07c upstream.
      
      When decrementing the reference count for a block, the free count wasn't
      being updated if the reference count went to zero.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7252903
    • Joe Thornber's avatar
      dm thin metadata: call precommit before saving the roots · f568a102
      Joe Thornber authored
      commit 91bcdb92 upstream.
      
      These calls were the wrong way round in __write_initial_superblock.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f568a102
    • Mikulas Patocka's avatar
      dm bufio: make the parameter "retain_bytes" unsigned long · 17cedf3b
      Mikulas Patocka authored
      commit 13840d38 upstream.
      
      Change the type of the parameter "retain_bytes" from unsigned to
      unsigned long, so that on 64-bit machines the user can set more than
      4GiB of data to be retained.
      
      Also, change the type of the variable "count" in the function
      "__evict_old_buffers" to unsigned long.  The assignment
      "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
      could result in unsigned long to unsigned overflow and that could result
      in buffers not being freed when they should.
      
      While at it, avoid division in get_retain_buffers().  Division is slow,
      we can change it to shift because we have precalculated the log2 of
      block size.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17cedf3b
    • Mike Snitzer's avatar
      dm cache metadata: fail operations if fail_io mode has been established · 2e80638c
      Mike Snitzer authored
      commit 10add84e upstream.
      
      Otherwise it is possible to trigger crashes due to the metadata being
      inaccessible yet these methods don't safely account for that possibility
      without these checks.
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e80638c
    • Bart Van Assche's avatar
      dm mpath: delay requeuing while path initialization is in progress · a5b9afd6
      Bart Van Assche authored
      commit c1d7ecf7 upstream.
      
      Requeuing a request immediately while path initialization is ongoing
      causes high CPU usage, something that is undesired.  Hence delay
      requeuing while path initialization is in progress.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5b9afd6
    • Bart Van Assche's avatar
      dm mpath: avoid that path removal can trigger an infinite loop · 5087ded2
      Bart Van Assche authored
      commit 7083abbb upstream.
      
      If blk_get_request() fails, check whether the failure is due to a path
      being removed.  If that is the case, fail the path by triggering a call
      to fail_path().  This avoids that the following scenario can be
      encountered while removing paths:
      * CPU usage of a kworker thread jumps to 100%.
      * Removing the DM device becomes impossible.
      
      Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK,
      and the queue is not dying, because in these cases immediate requeuing
      is inappropriate.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5087ded2
    • Bart Van Assche's avatar
      dm mpath: split and rename activate_path() to prepare for its expanded use · 5a094143
      Bart Van Assche authored
      commit 89bfce76 upstream.
      
      activate_path() is renamed to activate_path_work() which now calls
      activate_or_offline_path().  activate_or_offline_path() will be used
      by the next commit.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a094143
    • Bart Van Assche's avatar
      dm mpath: requeue after a small delay if blk_get_request() fails · 4ff00db7
      Bart Van Assche authored
      commit 06eb061f upstream.
      
      If blk_get_request() returns ENODEV then multipath_clone_and_map()
      causes a request to be requeued immediately. This can cause a kworker
      thread to spend 100% of the CPU time of a single core in
      __blk_mq_run_hw_queue() and also can cause device removal to never
      finish.
      
      Avoid this by only requeuing after a delay if blk_get_request() fails.
      Additionally, reduce the requeue delay.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ff00db7
    • Mikulas Patocka's avatar
      dm bufio: check new buffer allocation watermark every 30 seconds · 7d67cd02
      Mikulas Patocka authored
      commit 390020ad upstream.
      
      dm-bufio checks a watermark when it allocates a new buffer in
      __bufio_new().  However, it doesn't check the watermark when the user
      changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.
      
      This may result in a problem - if the watermark is high enough so that
      all possible buffers are allocated and if the user lowers the value of
      "max_cache_size_bytes", the watermark will never be checked against the
      new value because no new buffer would be allocated.
      
      To fix this, change __evict_old_buffers() so that it checks the
      watermark.  __evict_old_buffers() is called every 30 seconds, so if the
      user reduces "max_cache_size_bytes", dm-bufio will react to this change
      within 30 seconds and decrease memory consumption.
      
      Depends-on: 1b0fb5a5 ("dm bufio: avoid a possible ABBA deadlock")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d67cd02