1. 15 Apr, 2015 20 commits
    • Josef Bacik's avatar
      dm: add log writes target · 0e9cebe7
      Josef Bacik authored
      Introduce a new target that is meant for file system developers to test file
      system integrity at particular points in the life of a file system.  We capture
      all write requests and associated data and log them to a separate device
      for later replay.  There is a userspace utility to do this replay.  The
      idea behind this is to give file system developers a tool to verify that
      the file system is always consistent.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarZach Brown <zab@zabbo.net>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      0e9cebe7
    • Joe Perches's avatar
      dm table: use bool function return values of true/false not 1/0 · 7f61f5a0
      Joe Perches authored
      Use the normal return values for bool functions.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      7f61f5a0
    • Sami Tolvanen's avatar
      dm verity: add error handling modes for corrupted blocks · 65ff5b7d
      Sami Tolvanen authored
      Add device specific modes to dm-verity to specify how corrupted
      blocks should be handled.  The following modes are defined:
      
        - DM_VERITY_MODE_EIO is the default behavior, where reading a
          corrupted block results in -EIO.
      
        - DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does
          not block the read.
      
        - DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted
          block is discovered.
      
      In addition, each mode sends a uevent to notify userspace of
      corruption and to allow further recovery actions.
      
      The driver defaults to previous behavior (DM_VERITY_MODE_EIO)
      and other modes can be enabled with an additional parameter to
      the verity table.
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      65ff5b7d
    • Mike Snitzer's avatar
      dm thin: remove stale 'trim' message documentation · 0e0e32c1
      Mike Snitzer authored
      The 'trim' message wasn't ever implemented.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      0e0e32c1
    • Nicholas Mc Guire's avatar
      dm delay: use msecs_to_jiffies for time conversion · aca607ba
      Nicholas Mc Guire authored
      Converting milliseconds to jiffies by "val * HZ / 1000" is technically
      OK but msecs_to_jiffies(val) is the cleaner solution and handles all
      corner cases correctly.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      aca607ba
    • Nicholas Mc Guire's avatar
      dm log userspace base: fix compile warning · 18cc980a
      Nicholas Mc Guire authored
      This fixes up a compile warning [-Wunused-but-set-variable] - given the
      comment in userspace_set_region_sync() the non-reporting of errors is
      intentional so the return value can be dropped to make gcc happy.
      
      Also, fix typo in comment.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      18cc980a
    • Nicholas Mc Guire's avatar
      dm log userspace transfer: match wait_for_completion_timeout return type · c32a512f
      Nicholas Mc Guire authored
      Return type of wait_for_completion_timeout() is unsigned long not int.
      An appropriately named unsigned long is added and the assignment fixed.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c32a512f
    • Dan Ehrenberg's avatar
      dm table: fall back to getting device using name_to_dev_t() · 644bda6f
      Dan Ehrenberg authored
      If a device is used as the root filesystem, it can't be built
      off of devices which are within the root filesystem (just like
      command line arguments to root=).  For this reason, Linux has a
      pseudo-filesystem for root= and MD initialization (based on the
      function name_to_dev_t) which handles different ways of specifying
      devices including PARTUUID and major:minor.
      
      Switch to using name_to_dev_t() in dm_get_device().  Rather than
      having DM assume that all things which are not major:minor are paths in
      an already-mounted filesystem, change dm_get_device() to first attempt
      to look up the device in the filesystem, and if not found it will fall
      back to using name_to_dev_t().
      
      In terms of backwards compatibility, there are some cases where
      behavior will be different:
      - If you have a file in the current working directory named 1:2 and
        you initialze DM there, then it will try to use that file rather
        than the disk with that major:minor pair as a backing device.
      - Similarly for other bdev types which name_to_dev_t() knows how to
        interpret, the previous behavior was to repeatedly check for the
        existence of the file (e.g., while waiting for rootfs to come up)
        but the new behavior is to use the name_to_dev_t() interpretation.
        For example, if you have a file named /dev/ubiblock0_0 which is
        a symlink to /dev/sda3, but it is not yet present when DM starts
        to initialize, then the name_to_dev_t() interpretation will take
        precedence.
      
      These incompatibilities would only show up in really strange setups
      with bad practices so we shouldn't have to worry about them.
      Signed-off-by: default avatarDan Ehrenberg <dehrenberg@chromium.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      644bda6f
    • Dan Ehrenberg's avatar
      init: stricter checking of major:minor root= values · 283e7ad0
      Dan Ehrenberg authored
      In the kernel command-line, previously, root=1:2jakshflaksjdhfa would
      be accepted and interpreted just like root=1:2. This patch adds
      stricter checking so that additional characters after major:minor are
      rejected by root=.
      
      The goal of this change is to help in unifying DM's interpretation of
      its block device argument by using existing kernel code (name_to_dev_t).
      But DM rejects malformed major:minor pairs, it seems reasonable for
      root= to reject them as well.
      Signed-off-by: default avatarDan Ehrenberg <dehrenberg@chromium.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      283e7ad0
    • Dan Ehrenberg's avatar
      init: export name_to_dev_t and mark name argument as const · e6e20a7a
      Dan Ehrenberg authored
      DM will switch its device lookup code to using name_to_dev_t() so it
      must be exported.  Also, the @name argument should be marked const.
      Signed-off-by: default avatarDan Ehrenberg <dehrenberg@chromium.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e6e20a7a
    • Mike Snitzer's avatar
      dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8
      Mike Snitzer authored
      Request-based DM's blk-mq support defaults to off; but a user can easily
      change the default using the dm_mod.use_blk_mq module/boot option.
      
      Also, you can check what mode a given request-based DM device is using
      with: cat /sys/block/dm-X/dm/use_blk_mq
      
      This change enabled further cleanup and reduced work (e.g. the
      md->io_pool and md->rq_pool isn't created if using blk-mq).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      17e149b8
    • Mike Snitzer's avatar
      dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342
      Mike Snitzer authored
      dm_mq_queue_rq() is in atomic context so care must be taken to not
      sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
      and dm-mpath's call to blk_get_request().  In the future the bioset
      allocations will hopefully go away (by removing support for partial
      completions of bios in a cloned request).
      
      Also prepare for supporting DM blk-mq ontop of old-style request_fn
      device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
      will still be used to queue work if blk-mq is used ontop of old-style
      request_fn device(s).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      02233342
    • Mike Snitzer's avatar
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer authored
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      bfebd1cd
    • Mike Snitzer's avatar
      dm: impose configurable deadline for dm_request_fn's merge heuristic · 0ce65797
      Mike Snitzer authored
      Otherwise, for sequential workloads, the dm_request_fn can allow
      excessive request merging at the expense of increased service time.
      
      Add a per-device sysfs attribute to allow the user to control how long a
      request, that is a reasonable merge candidate, can be queued on the
      request queue.  The resolution of this request dispatch deadline is in
      microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
        echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline
      
      The dm_request_fn's merge heuristic and associated extra accounting is
      disabled by default (rq_based_seq_io_merge_deadline is 0).
      
      This sysfs attribute is not applicable to bio-based DM devices so it
      will only ever report 0 for them.
      
      By allowing a request to remain on the queue it will block others
      requests on the queue.  But introducing a short dequeue delay has proven
      very effective at enabling certain sequential IO workloads on really
      fast, yet IOPS constrained, devices to build up slightly larger IOs --
      yielding 90+% throughput improvements.  Having precise control over the
      time taken to wait for larger requests to build affords control beyond
      that of waiting for certain IO sizes to accumulate (which would require
      a deadline anyway).  This knob will only ever make sense with sequential
      IO workloads and the particular value used is storage configuration
      specific.
      
      Given the expected niche use-case for when this knob is useful it has
      been deemed acceptable to expose this relatively crude method for
      crafting optimal IO on specific storage -- especially given the solution
      is simple yet effective.  In the context of DM multipath, it is
      advisable to tune this sysfs attribute to a value that offers the best
      performance for the common case (e.g. if 4 paths are expected active,
      tune for that; if paths fail then performance may be slightly reduced).
      
      Alternatives were explored to have request-based DM autotune this value
      (e.g. if/when paths fail) but they were quickly deemed too fragile and
      complex to warrant further design and development time.  If this problem
      proves more common as faster storage emerges we'll have to look at
      elevating a generic solution into the block core.
      Tested-by: default avatarShiva Krishna Merla <shivakrishna.merla@netapp.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      0ce65797
    • Mike Snitzer's avatar
      dm sysfs: introduce ability to add writable attributes · b898320d
      Mike Snitzer authored
      Add DM_ATTR_RW() macro and establish .store method in dm_sysfs_ops.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b898320d
    • Mike Snitzer's avatar
      dm: don't start current request if it would've merged with the previous · de3ec86d
      Mike Snitzer authored
      Request-based DM's dm_request_fn() is so fast to pull requests off the
      queue that steps need to be taken to promote merging by avoiding request
      processing if it makes sense.
      
      If the current request would've merged with previous request let the
      current request stay on the queue longer.
      Suggested-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      de3ec86d
    • Mike Snitzer's avatar
      dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms · d548b34b
      Mike Snitzer authored
      Commit 7eaceacc ("block: remove per-queue plugging") didn't justify
      DM's use of a 100ms delay; such an extended delay is a liability when
      there is reason to re-kick the queue.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d548b34b
    • Mike Snitzer's avatar
      dm: don't schedule delayed run of the queue if nothing to do · 9d1deb83
      Mike Snitzer authored
      In request-based DM's dm_request_fn(), if blk_peek_request() returns
      NULL just return.  Avoids unnecessary blk_delay_queue().
      Reported-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      9d1deb83
    • Mike Snitzer's avatar
      dm: only run the queue on completion if congested or no requests pending · 9a0e609e
      Mike Snitzer authored
      On really fast storage it can be beneficial to delay running the
      request_queue to allow the elevator more opportunity to merge requests.
      
      Otherwise, it has been observed that requests are being sent to
      q->request_fn much quicker than is ideal on IOPS-bound backends.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      9a0e609e
    • Mike Snitzer's avatar
      dm: remove request-based logic from make_request_fn wrapper · ff36ab34
      Mike Snitzer authored
      The old dm_request() method used for q->make_request_fn had a branch for
      request-based DM support but it isn't needed given that
      dm_init_request_based_queue() sets it to the standard blk_queue_bio()
      anyway.
      
      Cleanup dm_init_md_queue() to be DM device-type agnostic and have
      dm_setup_md_queue() properly finish queue setup based on DM device-type
      (bio-based vs request-based).
      
      A followup block patch can be made to remove the export for
      blk_queue_bio() now that DM no longer calls it directly.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      ff36ab34
  2. 31 Mar, 2015 12 commits
  3. 30 Mar, 2015 2 commits
  4. 29 Mar, 2015 6 commits
    • Linus Torvalds's avatar
      Linux 4.0-rc6 · e42391cd
      Linus Torvalds authored
      e42391cd
    • Linus Torvalds's avatar
      Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 08f41f7c
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "The latest and greatest fixes for ARM platform code.  Worth pointing
        out are:
      
         - Lines-wise, largest is a PXA fix for dealing with interrupts on DT
           that was quite broken.  It's still newish code so while we could
           have held this off, it seemed appropriate to include now
      
         - Some GPIO fixes for OMAP platforms added a few lines.  This was
           also fixes for code recently added (this release).
      
         - Small OMAP timer fix to behave better with partially upstreamed
           platforms, which is quite welcome.
      
         - Allwinner fixes about operating point control, reducing
           overclocking in some cases for better stability.
      
        plus a handful of other smaller fixes across the map"
      
      * tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: juno: Fix misleading name of UART reference clock
        ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
        ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
        ARM: socfpga: dts: fix spi1 interrupt
        ARM: dts: Fix gpio interrupts for dm816x
        ARM: dts: dra7: remove ti,hwmod property from pcie phy
        ARM: OMAP: dmtimer: disable pm runtime on remove
        ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
        ARM: OMAP2+: Fix socbus family info for AM33xx devices
        ARM: dts: omap3: Add missing dmas for crypto
        ARM: dts: rockchip: disable gmac by default in rk3288.dtsi
        MAINTAINERS: add rockchip regexp to the ARM/Rockchip entry
        ARM: pxa: fix pxa interrupts handling in DT
        ARM: pxa: Fix typo in zeus.c
        ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage
      08f41f7c
    • Olof Johansson's avatar
      Merge tag 'sunxi-fixes-for-4.0' of... · 4550bdb0
      Olof Johansson authored
      Merge tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
      
      Allwinner fixes for 4.0
      
      There's a few fixes to merge for 4.0, one to add a select in the machine
      Kconfig option to fix a potential build failure, and two fixing cpufreq related
      issues.
      
      * tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
        ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
        ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
        ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      4550bdb0
    • Olof Johansson's avatar
      Merge tag 'fixes-v4.0-rc4' of... · b1dae3d8
      Olof Johansson authored
      Merge tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
      
      Fixes for omaps for the -rc cycle:
      
      - Fix a device tree based booting vs legacy booting regression for
        omap3 crypto hardware by adding the missing DMA channels.
      
      - Fix /sys/bus/soc/devices/soc0/family for am33xx devices.
      
      - Fix two timer issues that can cause hangs if the timer related
        hwmod data is missing like it often initially is for new SoCs.
      
      - Remove pcie hwmods entry from dts as that causes runtime PM to
        fail for the PHYs.
      
      - A paper bag type dts configuration fix for dm816x GPIO
        interrupts that I just noticed. This is most of the changes
        diffstat wise, but as it's a basic feature for connecting
        devices and things work otherwise, it should be fixed.
      
      * tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
        ARM: dts: Fix gpio interrupts for dm816x
        ARM: dts: dra7: remove ti,hwmod property from pcie phy
        ARM: OMAP: dmtimer: disable pm runtime on remove
        ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
        ARM: OMAP2+: Fix socbus family info for AM33xx devices
        ARM: dts: omap3: Add missing dmas for crypto
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      b1dae3d8
    • Olof Johansson's avatar
      Merge tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next into fixes · ebc0aa8f
      Olof Johansson authored
      Late fix for v4.0 on the SoCFPGA platform:
      - Fix interrupt number for SPI1 interface
      
      * tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next:
        ARM: socfpga: dts: fix spi1 interrupt
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      ebc0aa8f
    • Dave Martin's avatar
      arm64: juno: Fix misleading name of UART reference clock · 78d84bc3
      Dave Martin authored
      The UART reference clock speed is 7273.8 kHz, not 72738 kHz.
      
      Dots aren't usually used in node names even though ePAPR permits
      them.  However, this can easily be avoided by expressing the
      frequency in Hz, not kHz.
      
      This patch changes the name to refclk7273800hz, reflecting the
      actual clock speed.
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarLiviu Dudau <Liviu.Dudau@arm.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      78d84bc3