1. 28 Feb, 2017 1 commit
  2. 17 Feb, 2017 2 commits
    • Mikulas Patocka's avatar
      dm: flush queued bios when process blocks to avoid deadlock · d67a5f4b
      Mikulas Patocka authored
      Commit df2cb6da ("block: Avoid deadlocks with bio allocation by
      stacking drivers") created a workqueue for every bio set and code
      in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
      by redirecting bios queued on current->bio_list to the workqueue if the
      system is low on memory.  However other deadlocks (see below **) may
      happen, without any low memory condition, because generic_make_request
      is queuing bios to current->bio_list (rather than submitting them).
      
      ** the related dm-snapshot deadlock is detailed here:
      https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html
      
      Fix this deadlock by redirecting any bios on current->bio_list to the
      bio_set's rescue workqueue on every schedule() call.  Consequently,
      when the process blocks on a mutex, the bios queued on
      current->bio_list are dispatched to independent workqueus and they can
      complete without waiting for the mutex to be available.
      
      The structure blk_plug contains an entry cb_list and this list can contain
      arbitrary callback functions that are called when the process blocks.
      To implement this fix DM (ab)uses the onstack plug's cb_list interface
      to get its flush_current_bio_list() called at schedule() time.
      
      This fixes the snapshot deadlock - if the map method blocks,
      flush_current_bio_list() will be called and it redirects bios waiting
      on current->bio_list to appropriate workqueues.
      
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
      Depends-on: df2cb6da ("block: Avoid deadlocks with bio allocation by stacking drivers")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d67a5f4b
    • Mike Snitzer's avatar
      dm round robin: revert "use percpu 'repeat_count' and 'current_path'" · 37a098e9
      Mike Snitzer authored
      The sloppy nature of lockless access to percpu pointers
      (s->current_path) in rr_select_path(), from multiple threads, is
      causing some paths to used more than others -- which results in less
      IO performance being observed.
      
      Revert these upstream commits to restore truly symmetric round-robin
      IO submission in DM multipath:
      
      b0b477c7 dm round robin: use percpu 'repeat_count' and 'current_path'
      802934b2 dm round robin: do not use this_cpu_ptr() without having preemption disabled
      
      There is no benefit to all this complexity if repeat_count = 1 (which is
      the recommended default).
      
      Cc: stable@vger.kernel.org # 4.6+
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      37a098e9
  3. 16 Feb, 2017 14 commits
  4. 25 Jan, 2017 6 commits
    • Mike Snitzer's avatar
    • Heinz Mauelshagen's avatar
      977f1a0a
    • Heinz Mauelshagen's avatar
      dm raid: use read_disk_sb() throughout · e2568465
      Heinz Mauelshagen authored
      For consistency, call read_disk_sb() from
      attempt_restore_of_faulty_devices() instead
      of calling sync_page_io() directly.
      
      Explicitly set device to faulty on superblock read error.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e2568465
    • Heinz Mauelshagen's avatar
      dm raid: add raid4/5/6 journaling support · 63c32ed4
      Heinz Mauelshagen authored
      Add md raid4/5/6 journaling support (upstream commit bac624f3 started
      the implementation) which closes the write hole (i.e. non-atomic updates
      to stripes) using a dedicated journal device.
      
      Background:
      raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5
      or two raid6 P/Q syndrome payloads in an in-memory stripe cache.
      Parity or P/Q syndromes used to recover any data payloads in case of a disk
      failure are calculated from the N data payloads and need to be updated on the
      different component devices of the raid device.  Those are non-atomic,
      persistent updates.  Hence a crash can cause failure to update all stripe
      payloads persistently and thus cause data loss during stripe recovery.
      This problem gets addressed by writing whole stripe cache entries (together with
      journal metadata) to a persistent journal entry on a dedicated journal device.
      Only if that journal entry is written successfully, the stripe cache entry is
      updated on the component devices of the raid device (i.e. writethrough type).
      In case of a crash, the entry can be recovered from the journal and be written
      again thus ensuring consistent stripe payload suitable to data recovery.
      
      Future dependencies:
      once writeback caching being worked on to compensate for the throughput
      implictions involved with writethrough overhead is supported with journaling
      in upstream, an additional patch based on this one will support it in dm-raid.
      
      Journal resilience related remarks:
      because stripes are recovered from the journal in case of a crash, the
      journal device better be resilient.  Resilience becomes mandatory with
      future writeback support, because loosing the working set in the log
      means data loss as oposed to writethrough, were the loss of the
      journal device 'only' reintroduces the write hole.
      
      Fix comment on data offsets in parse_dev_params() and initialize
      new_data_offset as well.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      63c32ed4
    • Heinz Mauelshagen's avatar
      dm raid: be prepared to accept arbitrary '- -' tuples · 50c4feb9
      Heinz Mauelshagen authored
      During raid set resize checks and setting up the recovery offset in case a raid
      set grows, calculated rd->md.dev_sectors is compared to rs->dev[0].rdev.sectors.
      
      Device 0 may not be defined in case userspace passes in '- -' for it
      (lvm2 doesn't do that so far), thus it's device sectors can't be taken
      authoritatively in this comparison and another valid device must be used
      to retrieve the device size.
      
      Use mddev->dev_sectors in checking for ongoing recovery for the same reason.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      50c4feb9
    • Heinz Mauelshagen's avatar
      dm raid: fix transient device failure processing · c63ede3b
      Heinz Mauelshagen authored
      This fix addresses the following 3 failure scenarios:
      
      1) If a (transiently) inaccessible metadata device is being passed into the
      constructor (e.g. a device tuple '254:4 254:5'), it is processed as if
      '- -' was given.  This erroneously results in a status table line containing
      '- -', which mistakenly differs from what has been passed in.  As a result,
      userspace libdevmapper puts the device tuple seperate from the RAID device
      thus not processing the dependencies properly.
      
      2) False health status char 'A' instead of 'D' is emitted on the status
      status info line for the meta/data device tuple in this metadata device
      failure case.
      
      3) If the metadata device is accessible when passed into the constructor
      but the data device (partially) isn't, that leg may be set faulty by the
      raid personality on access to the (partially) unavailable leg.  Restore
      tried in a second raid device resume on such failed leg (status char 'D')
      fails after the (partial) leg returned.
      
      Fixes for aforementioned failure scenarios:
      
      - don't release passed in devices in the constructor thus allowing the
        status table line to e.g. contain '254:4 254:5' rather than '- -'
      
      - emit device status char 'D' rather than 'A' for the device tuple
        with the failed metadata device on the status info line
      
      - when attempting to restore faulty devices in a second resume, allow the
        device hot remove function to succeed by setting the device to not in-sync
      
      In case userspace intentionally passes '- -' into the constructor to avoid that
      device tuple (e.g. to split off a raid1 leg temporarily for later re-addition),
      the status table line will correctly show '- -' and the status info line will
      provide a '-' device health character for the non-defined device tuple.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c63ede3b
  5. 22 Jan, 2017 10 commits
    • Linus Torvalds's avatar
      Linux 4.10-rc5 · 7a308bb3
      Linus Torvalds authored
      7a308bb3
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 095cbe66
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "Restore the retrigger callbacks in the IO APIC irq chips. That
        addresses a long standing regression which got introduced with the
        rewrite of the x86 irq subsystem two years ago and went unnoticed so
        far"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Restore IO-APIC irq_chip retrigger callback
      095cbe66
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24b86839
      Linus Torvalds authored
      Pull smp/hotplug fix from Thomas Gleixner:
       "Remove an unused variable which is a leftover from the notifier
        removal"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Remove unused but set variable in _cpu_down()
      24b86839
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 585457fc
      Linus Torvalds authored
      Pull virtio/vhost fixes from Michael Tsirkin:
       "Random fixes and cleanups that accumulated over the time"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio/s390: virtio: constify virtio_config_ops structures
        virtio/s390: add missing \n to end of dev_err message
        virtio/s390: support READ_STATUS command for virtio-ccw
        tools/virtio/ringtest: tweaks for s390
        tools/virtio/ringtest: fix run-on-all.sh for offline cpus
        virtio_console: fix a crash in config_work_handler
        vhost/scsi: silence uninitialized variable warning
        vhost: scsi: constify target_core_fabric_ops structures
      585457fc
    • Linus Torvalds's avatar
      Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · bb6c01c2
      Linus Torvalds authored
      Pull thermal management fixes from Zhang Rui:
      
       - fix a regression that thermal zone dynamically allocated sysfs
         attributes are freed before they're removed, which is introduced in
         4.10-rc1 (Jacob von Chorus)
      
       - fix a boot warning because deprecated hwmon API is used (Fabio
         Estevam)
      
       - a couple of fixes for rockchip thermal driver (Brian Norris, Caesar
         Wang)
      
      * 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        thermal: rockchip: fixes the conversion table
        thermal: core: move tz->device.groups cleanup to thermal_release
        thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()
        thermal: rockchip: handle set_trips without the trip points
        thermal: rockchip: optimize the conversion table
        thermal: rockchip: fixes invalid temperature case
        thermal: rockchip: don't pass table structs by value
        thermal: rockchip: improve conversion error messages
      bb6c01c2
    • Linus Torvalds's avatar
      Merge tag 'usb-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c497f8d1
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a few small USB fixes for 4.10-rc5.
      
        Most of these are gadget/dwc2 fixes for reported issues, all of these
        have been in linux-next for a while. The last one is a single xhci
        WARN_ON removal to handle an issue that the dwc3 driver is hitting in
        the 4.10-rc tree. The warning is harmless and needs to be removed, and
        a "real" fix that is more complex will show up in 4.11-rc1 for this
        device.
      
        That last patch hasn't been in linux-next yet due to the weekend
        timing, but it's a "simple" WARN_ON() removal so what could go wrong?
        :)"
      
      Famous last words.
      
      * tag 'usb-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: remove WARN_ON if dma mask is not set for platform devices
        usb: dwc2: host: fix Wmaybe-uninitialized warning
        usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value
        usb: gadget: udc: atmel: remove memory leak
        usb: dwc3: exynos fix axius clock error path to do cleanup
        usb: dwc2: Avoid suspending if we're in gadget mode
        usb: dwc2: use u32 for DT binding parameters
        usb: gadget: f_fs: Fix iterations on endpoints.
        usb: dwc2: gadget: Fix DMA memory freeing
        usb: gadget: composite: Fix function used to free memory
      c497f8d1
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f68d8531
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "Two fixes:
      
         - a regression fix for the multiple-pmem-namespace-per-region support
           added in 4.9. Even if an existing environment is not using that
           feature the act of creating and a destroying a single namespace
           with the ndctl utility will lead to the proliferation of extra
           unwanted namespace devices.
      
         - a fix for the error code returned from the pmem driver when the
           memcpy_mcsafe() routine returns -EFAULT. Btrfs seems to be the only
           block I/O consumer that tries to parse the meaning of the error
           code when it is non-zero.
      
        Neither of these fixes are critical, the namespace leak is awkward in
        that it can cause device naming to change and complicates debugging
        namespace initialization issues. The error code fix is included out of
        caution for what other consumers might be expecting -EIO for block I/O
        errors"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, namespace: fix pmem namespace leak, delete when size set to zero
        pmem: return EIO on read_pmem() failure
      f68d8531
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · f5e8c0ff
      Linus Torvalds authored
      Pull clk fix from Stephen Boyd:
       "One fix for Samsung Exynos524x SoCs where recent IOMMU patches have
        caused some of these clocks to turn off when they were always left on
        before"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk/samsung: exynos542x: mark some clocks as critical
      f5e8c0ff
    • Linus Torvalds's avatar
      Merge tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 455a70cb
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
      
       - more intc updates [Yuriv]
      
       - fix module build when unwinder is turned off
      
       - IO Coherency Programming model updates
      
       - other miscellaneous
      
      * tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"
        ARC: mm: split arc_cache_init to allow __init reaping of bulk
        ARCv2: IOC: Use actual memory size to setup aperture size
        ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
        ARCv2: IOC: refactor the IOC and SLC operations into own functions
        ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds
        ARCv2: save r30 on kernel entry as gcc uses it for code-gen
        ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
        ARC: IRQ: Use hwirq instead of virq in mask/unmask
        ARC: mmu: clarify the MMUv3 programming model
      455a70cb
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 83fd57a7
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Two fixes for fallout from the hugetlb changes we merged this cycle.
      
        Ten other fixes, four only affect Power9, and the rest are a bit of a
        mixture though nothing terrible.
      
        Thanks to: Aneesh Kumar K.V, Anton Blanchard, Benjamin Herrenschmidt,
        Dave Martin, Gavin Shan, Madhavan Srinivasan, Nicholas Piggin, Reza
        Arbab"
      
      * tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: Ignore reserved field in DCSR and PVR reads and writes
        powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
        powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
        powerpc/perf: Use MSR to report privilege level on P9 DD1
        selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
        powerpc/eeh: Enable IO path on permanent error
        powerpc/perf: Fix PM_BRU_CMPL event code for power9
        powerpc/mm: Fix little-endian 4K hugetlb
        powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size
        powerpc: Fix pgtable pmd cache init
        powerpc/icp-opal: Fix missing KVM case and harden replay
        powerpc/mm: Fix memory hotplug BUG() on radix
      83fd57a7
  6. 20 Jan, 2017 7 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 4c9eff7a
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM:
         - Fix for timer setup on VHE machines
         - Drop spurious warning when the timer races against the vcpu running
           again
         - Prevent a vgic deadlock when the initialization fails (for stable)
      
        s390:
         - Fix a kernel memory exposure (for stable)
      
        x86:
         - Fix exception injection when hypercall instruction cannot be
           patched"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: do not expose random data via facility bitmap
        KVM: x86: fix fixing of hypercalls
        KVM: arm/arm64: vgic: Fix deadlock on error handling
        KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
        KVM: arm/arm64: Fix occasional warning from the timer work function
      4c9eff7a
    • Linus Torvalds's avatar
      Merge branch 'scsi-target-for-v4.10' of... · 51162264
      Linus Torvalds authored
      Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
      
      Pull SCSI target fixes from Bart Van Assche:
      
       - two small fixes for the ibmvscsis driver
      
       - ten patches with bug fixes for the target mode of the qla2xxx driver
      
       - four patches that avoid that the "sparse" and "smatch" static
         analyzer tools report false positives for the qla2xxx code base
      
      * 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux:
        qla2xxx: Disable out-of-order processing by default in firmware
        qla2xxx: Fix erroneous invalid handle message
        qla2xxx: Reduce exess wait during chip reset
        qla2xxx: Terminate exchange if corrupted
        qla2xxx: Fix crash due to null pointer access
        qla2xxx: Collect additional information to debug fw dump
        qla2xxx: Reset reserved field in firmware options to 0
        qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version
        qla2xxx: Include ATIO queue in firmware dump when in target mode
        qla2xxx: Fix wrong IOCB type assumption
        qla2xxx: Avoid that building with W=1 triggers complaints about set-but-not-used variables
        qla2xxx: Move two arrays from header files to .c files
        qla2xxx: Declare an array with file scope static
        qla2xxx: Fix indentation
        ibmvscsis: Fix sleeping in interrupt context
        ibmvscsis: Fix max transfer length
      51162264
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · e3737b91
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Just two small fixes for this -rc.
      
        One is just killing an unused variable from Keith, but the other
        fixes a performance regression for nbd in this series, where we
        inadvertently flipped when we set MSG_MORE when outputting data"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nbd: only set MSG_MORE when we have more to send
        blk-mq: Remove unused variable
      e3737b91
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · cca112ec
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "The usual small smattering of driver specific fixes. A few bits that
        stand out here:
      
         - the R-Car patches adding fallbacks are just adding new compatible
           strings to the driver so that device trees are written in a more
           robustly future proof fashion, this isn't strictly a fix but it's
           just new IDs and it's better to get it into mainline sooner to
           improve the ABI
      
         - the DesignWare "switch to new API part 2" patch is actually a
           misleadingly titled fix for a bit that got missed in the original
           conversion"
      
      * tag 'spi-fix-v4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: davinci: use dma_mapping_error()
        spi: spi-axi: Free resources on error path
        spi: pxa2xx: add missed break
        spi: dw-mid: switch to new dmaengine_terminate_* API (part 2)
        spi: dw: Make debugfs name unique between instances
        spi: sh-msiof: Do not use C++ style comment
        spi: armada-3700: Set mode bits correctly
        spi: armada-3700: fix unsigned compare than zero on irq
        spi: sh-msiof: Add R-Car Gen 2 and 3 fallback bindings
        spi: SPI_FSL_DSPI should depend on HAS_DMA
      cca112ec
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-client · e90665a5
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Three filesystem endianness fixes (one goes back to the 2.6 era, all
        marked for stable) and two fixups for this merge window's patches"
      
      * tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-client:
        ceph: fix bad endianness handling in parse_reply_info_extra
        ceph: fix endianness bug in frag_tree_split_cmp
        ceph: fix endianness of getattr mask in ceph_d_revalidate
        libceph: make sure ceph_aes_crypt() IV is aligned
        ceph: fix ceph_get_caps() interruption
      e90665a5
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 56ef1882
      Linus Torvalds authored
      Pull overlayfs fix from Miklos Szeredi:
       "This fixes a regression introduced in this cycle"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: fix possible use after free on redirect dir lookup
      56ef1882
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · eefa9feb
      Linus Torvalds authored
      Pull fuse fixes from Miklos Szeredi:
       "Fix two regressions, one introduced in 4.9 and a less recent one in
        4.2"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: fix time_to_jiffies nsec sanity check
        fuse: clear FR_PENDING flag when moving requests out of pending queue
      eefa9feb